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CHAPTER  1 

INTRODUCTION  AND  OUTLINE  OF  REPORT 


1 . 1  Introduction 

This  report  describes  the  results  of  a  sixteen  month  effort  under  OCA 
Contract  100-79-C-0005  to  develop  and  implement  a  speech  coding  algorithm 
designed  to  produce  very  good  quality  speech  at  9600  bits  per  second.  The 
technique  is  based  on  the  latest  developments  in  speech  digitization  and 
is  formulated  to  comply  with  the  requirements  described  in  the  statement 
of  work. 

The  method  is  based  on  techniques  which  have  been  reported  in  the 
literature  but  which  are  brought  together  here  for  the  first  time.  The 
system  combines  elements  of  adaptive  predictive  coding  and  ADPCM  systems 
and  is  known  as  PARC,  Pitch  extraction  Adaptive  Residual  Coder.  The  pitch 
extraction  loop  used  in  adaptive  predictive  coding  provided  the  input 
to  a  sequentially  adaptive  predictor,  using  backward  coefficient  adapta¬ 
tion  which  forms  an  estimate  of  the  pitch-reduced  signal.  The  error  in 
this  estimate  is  quantized  by  a  pitch  compensating  adaptive  quantizer. 

The  resulting  quantizer  output  is  coded  using  an  adaptive  source  coding 
procedure.  The  source  code  also  permits  transmission  of  pitch  informa¬ 
tion  and  synchronization  signals. 

This  algorithm  was  implemented  on  a  pair  of  CSPI  MAP  300  signal 
processors  to  generate  a  real-time,  full-duplex  speech  encoding  system. 

This  algorithm  has  several  features  which  are  significant  in  a  full 
system  application.  The  waveform  reconstruction  nature  of  the  algorithm 
provides  excellent  performance  in  tandem  with  CVSD  and  in  the  presence 
of  background  noise.  If  bit  rates  higher  than  9600  b/s  are  permitted. 
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the  algorithm  is  easily  adaptive  to  them.  For  example,  a  16  kb/s  version 
of  the  same  algorithm  will  differ  only  in  the  number  of  quantization 
levels  and  the  source  code  algorithm.  It  could  easily  be  implemented 
using  the  same  software. 


I 


1 . 2  Summary  of  Algorithm  Requirements 

The  following  requirements  for  the  speech  coding  algorithm  have  been 
determined  from  the  Statement  of  Work. 

1.  The  speech  processing  system  shall  operate  a  transmission 
data  rate  of  9600  b/s. 

2.  The  speech  processing  system  shall  produce  very  high  qualitv 
speech  reproduction.  This  requirement  is  interpreted  to 
mean  a  signal-to-noise  ratio  of  approximately  20  db. 

3.  The  audio  bandwidth  of  the  speech  coder  shall  be  greater  than 
or  equal  to  3200  Hz. 

A.  The  speech  coder  shall  produce  good  quality  speech  under 
conditions  of  a  random  transmission  bit  error  rate  of  1 
percent. 

5.  The  speech  coder  shall  produce  intelligible  speech  under 

conditions  of  acoustic  background  noise  (60  db  referenced 

2 

to  20  p  Newton/meter  )  such  as  office  noise. 

6.  The  speech  coder  shall  perform  satisfactorily  in  tandem 
with  a  CVSD  speech  coder  operating  at  a  data  rate  of  16  kb/s. 
This  tandem  configuration  shall  provide  speech  intelligi¬ 
bility  with  minimal  degradation  compared  with  a  single  link 
of  CVSD  operating  at  16  kb/s. 

The  algorithm  shall  be  implemented  on  a  pair  of  CSPI  MAP  300 


signal  processor  in  real-time,  full  duplex  mode  with  appropriate 
synchronization. 
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1. 3  Outline  of  Report 

Following  the  introductory  material  in  this  chapter,  the  next  two 
chapters  of  the  report  describe  the  general  details  of  the  PARC 
algorithm.  Chapter  2  describes  the  final  form  of  the  PARC  speech 
dig'.;  Nation  algorithm  developed  in  this  study. 

The  following  nine  chapters  of  the  report  describe  in  more  depth  the 
details  of  the  algorithm  and  various  studies  that  were  made  during  this 
contract  period  but  which  do  not  necessarily  appear  in  the  final  algorithm. 
Chapter  3  delineates  the  details  of  synchronization  for  full  duplex 
operation.  Chapter  4  describes  various  pitch  extraction  studies,  while 
Chapter  5  describes  details  of  the  tree  coding  studies  that  were  conducted. 

A  special  form  of  the  PARC  algorithm  in  which  backward  adaptation  is  used 
for  the  pitch  extraction  operation  is  described  in' Chapter  6.  The  operation 
of  the  algorithm  in  tandem  with  CVSD  is  discussed  in  Chapter  7,  while 
Chapter  8  is  concerned  with  transmission  studies.  The  final  algorithm  uses 
an  adaptive  filter  on  the  input  speech  to  improve  subjective  performance. 
This  algorithm  is  described  in  Chapter  9.  Chapter  10  is  concerned  with  the 
buffer  control  algorithms  used  in  the  PARC  system,  while  Chapter  11  is  con¬ 
cerned  with  source  and  error  control  coding  for  full-duplex  operation. 

Chapter  12  describes  the  real-time  implementation  of  the  algorithm  on 
the  MAP  processor.  The  remainder  of  the  report  is  a  series  of  appendices 
which  describe  various  details  of  the  programming  of  the  algorithms  in  both 
Fortran  and  on  the  MAP,  as  well  as  various  support  packages. 
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CHAPTER  2 

ALGORITHM  DESCRIPTION 

2 . 1  Introduction 

This  chapter  will  describe  the  final  form  of  the  PARC  speech  digitization 
algorithm  developed  during  this  study.  The  algorithm  will  be  decomposed 
into  its  constituent  elements  and  each  element  will  be  described  in  turn. 

The  underlying  theory  will  be  discussed  and  the  details  of  the  recommended 
implementation  will  be  presented.  Later  chapters  will  describe  the  real¬ 
time  implementation  and  the  various  studies  which  led  to  the  recommended 
form  of  the  algorithm. 

The  PARC  digital  speech  communication  system  can  be  represented  as 
shown  in  Fig.  2.1.  The  analog  speech  signal  s(t)  is  converted  to  a 
sequence  of  finely  quantized  samples  s(k).  These  quantized  samples  are 
stored  in  a  buffer  whose  delay  B^  can  vary  over  time.  The  PARC  transmitter 
section  does  the  actual  data  reduction  to  produce  the  quantizer  level 
sequence.  This  is  represented  by  the  qCk-B^.  This  sequence,  along  with 
the  side  information  quantized  g  and  T,  is  noiselessly  encoded  into  the  bit 
stream  b(m)  for  transmission  on  the  channel. 

At  the  receiver  end,  the  process  is  essentially  reversed.  The  bit 
stream  b'(m)  is  decoded  into  the  sequence  q'(k-B^)  and  the  side  informa¬ 
tion  g'  and  T'.  The  primes  are  used  to  allow  for  channel  errors.  The 
PARC  receiver  device  converts  this  information  into  reconstructed  speech 
s (k-B^) .  This  is  buffered  with  a  variable  delay  B2-  A  D/A  output  unit 
presents  a  filtered  version  of  the  delayed  speech  to  the  user.  The 
overall  system  delay  B^+B2  is  a  constant. 


: 


/ 
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The  analog  filters  were  designed  by  GTE  under  a  separate 
contract.  They  are  described  elsewhere.  Therefore,  the  discussion  here 
will  begin  after  the  continuous  signal  has  been  converted  to  s(k). 

Figure  2.2  shows  a  more  detailed  block  diagram  of  the  transmitter 
system.  The  syste.r  consists  of  the  following  major  components: 

•  SAMPLE  buffer 

•  Adaptive  Low-pass  filter 

•  Pitch  extraction  loop 

•  Adaptive  Residual  Coder 

•  Noiseless  source  coder. 

As  noted,  the  SAMPLE  buffer  receives  the  incoming  speech  samples  and 
holds  them  for  further  processing.  For  notational  simplicity,  the  samples 
at  the  output  of  this  buffer  will  be  referred  to  as  s(k).  The  adaptive 
low-pass  filter  is  used  as  needed  to  help  prevent  an  overflow  of  the 
SAMPLE  buffer.  The  output  of  the  adaptive  low-pass  filter  s^(k)  forms 
the  input  to  the  pitch  extraction  loop.  The  pitch  extraction  loop  uses 
a  block  of  these  filtered  samples  s^(k)  to  estimate  the  pitch  period  T  and 
the  correlation  coefficient  6.  Using  this  information,  the  pitch-reduced 
speech  samples  v(k)  are  calculated  by 

v(k)  =  s^(k)  -  6s (k-T)  (2.1) 

The  pitch-reduced  speech  samples  v(k)  are  then  processed  by  the 
Adaptive  Residual  Coder.  An  estimate  p(k)  of  v(k) ,  produced  by  the  adaptive 
predictor  is  subtracted  from  v(k)  to  form  the  prediction  error  e(k). 

The  prediction  error  e(k)  is  then  passed  through  an  adaptive  quantizer 
to  yield  the  quantizer  level  q(k).  This  quantizer  level  q(k)  is  the 
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input  to  both  the  inverse  quantizer  (to  update  the  rest  of  the  system)  and 
the  noiseless  source  coder  (to  be  transmitted  down  the  channel).  The 
noiseless  source  coder  combines  the  quantizer  level  information  q(k), 
the  pitch  period  T,  and  the  correlation  coefficient  S,  and  generates  the 
binary  bit  stream  b(m)  to  convey  this  information  to  the  receiver. 

The  underlying  design  principle  of  the  adaptation  procedure  is  that 
all  information  used  in  updating  the  inverse  quantizer  and  the  predictor 
be  available  both  at  the  transmitter  and  the  receiver.  This  will  allow 
the  receiver  to  replicate  these  devices.  Since  the  only  information  sent 
from  the  transmitter  to  the  receiver  is  the  quantizer  output  and  pitching 
information,  the  adaptation  procedures  for  the  inverse  quantizer  and  the 
predictor  must  use  quantities  derivable  from  them  and  a  from  pre-arranged 
initial  state. 

Although  the  PARC  is  basically  a  sequential  system,  the  use  of 
pitch  redundancy  reduction  forces  a  block  structure.  For  each  block,  new 
values  of  6  and  T  are  computed.  The  resulting  block  structure  appears 
throughout  the  real-time  implementation  of  the  system. 

Subsequent  sections  will  describe  each  of  the  subsystems  in  the 
transmitter.  The  next  section  explains  the  operation  of  the  transmitter 
buffer  system.  Since  the  noiseless  source  coder  produces  a  variable 
number  of  bits  for  each  sample,  the  rate  at  which  samples  are  processed 
varies  with  time.  Thus,  the  buffering  operation  is  quite  complex  and 
important. 

Section  2. 3  describes  the  implementation  of  the  adaptive  low-pass 
filter.  The  filter  is  only  activated  when  the  SAMPLE  buffer  is  almost 
full  and  it  is  only  used  to  reduce  the  rate  at  which  bits  are  generated. 
Thus,  its  operation  is  closely  related  to  that  of  the  SAMPLE  buffer. 
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full  and  it  is  only  used  to  reduce  the  rate  at  which  bits  are  generated. 

Thus,  its  operation  is  closely  related  to  that  of  the  SAMPLE  buffer. 

The  pitch  extraction  algorithm  is  a  fairly  standard  AMDF  system  and 
is  explained  in  Section  2.4.  Although  8  and  T  are  computed  for  a  fixed 
number  of  speech  samples,  the  number  of  samples  they  are  actual lv  used  on 
will  vary. 

The  adaptive  residual  coder  is  a  sophisticated  ADPCM  system.  Section  2.3 
describes  parametric  modifications  used  to  optimize  the  structure  for  pitch- 
reduced  speech  and  to  combat  channel  errors.  A  new  feature,  knovm  as 
pitched  repetition,  has  been  added  to  the  ARC  to  reduce  the  bit  rate  during 
voiced  speech.  It  is  described  in  Section  2.6. 

A  central  feature  of  the  PARC  system  is  the  noiseless  source  coding 
structure.  It  allows  the  most  efficient  use  of  all  channel  bits  to  yield 
the  highest  fidelity  speech.  In  the  final  implementation,  it  includes  the 
binary  representation  of  q(k),  8  and  T  as  well  as  synchronization  and  error 
control.  Section  2.7  gives  the  details  of  this  component. 

The  next-to-last  section  of  this  chapter  describes  the  structure  of  the 
receiver.  It  too  has  a  complex  buffer  structure  which  must  be  considered 
during  the  system  design.  The  final  section  is  a  list  of  references  to  the 
various  techniques  employed  by  PARC. 
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2.2  Transmitter  Buffer  System 

The  buffering  of  data  flows  in  the  transmitter  is  a  fairly  complex 
procedure.  Due  to  the  variable  rate  coding  procedure,  the  overall  delav 
experienced  by  a  speech  sample  traveling  through  the  transmitter  varies 
with  time.  During  unvoiced  speech  or  silence,  the  delay  will  be  short  but 
during  voiced  speech,  it  can  be  over  a  thousand  sample  times.  However, 
the  receiver  must  produce  one  reconstructed  speech  sample  for  each  sample 
that  enters  the  transmitter.  Thus,  if  there  are  no  channel  errors,  the 
overall  system  delay  is  fixed. 

There  are  actually  four  separate  buffers  used  in  the  transmitter 
buffer  system.  Three  are  used  to  facilitate  data  transfer  and  to  allow 
parallel  processing;  one  accommodates  the  variable  delay.  The 
buffers  are  the  ADAM  buffer,  the  SAMPLE  buffer,  the  LEVEL  buffer  and  the 
BTT  buffer.  They  are  shown  schematically  in  Fig.  2.3.  The  ADAM  buffer, 
the  LEVEL  buffer,  and  the  BIT  buffer  are  each  double  buffers  which  allow 
parallel  data  processing  and  have  essentially  a  fixed  delay.  The  SAMPLE 
buffer,  though,  is  a  large  circular  buffer  which  accommodates  the  system's 
variable- rate  encoding. 

The  operation  of  the  four  buffers  is  best  described  by  first  explaining 
the  input  and  output  for  each  of  them.  Continuous  speech  signal  enters 
the  PARC  system  through  an  analog  speech  interface.  This  interface  condi¬ 
tions  the  signal  for  the  Analog  Data  Acquisition  Module  (ADAM)  by  low-pass 
filtering  it  and  by  adjusting  the  signal  level  to  the  range  of  the  analog- 
to-digital  converter  in  the  ADAM.  The  ADAM  samples  this  signal  at  a  rate 
of  6.4  Kss  and  places  the  samples  into  the  ADAM  buffer.  Thus,  the  input 
to  the  buffer  is  a  sequence  of  single  speech  samples  at  the  rate  of  one 
each  156.25  psec. 
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Fig.  2.3  Transmitter  Buffer  System 


Data  is  removed  from  the  ADAM  buffer  in  blocks  of  12b  samples  each 
19  687.5  Msec.  Thus,  data  enters  and  leaves  the  buffer  at  an  average  rate 
of  6400  samples  per  second.  The  blocks  of  126  samples  are  transferred  to 
the  SAMPLE  buffer.  The  rate  at  which  these  samples  are  processed  by  the 
noise  reducer  and  then  are  removed  from  the  SAMPLE  buffer  depends  on  the 
number  of  bits  they  generate.  The  block-time  of  19  687.5  M sec .  was  selected 
to  correspond  to  both  126  input  sampling  intervals  and  to  the  time  allowed 
to  transmit  189  bits  on  the  channel.  Thus,  the  number  of  samples  N  removed 
from  the  SAMPLE  buffer  in  one  block-time  will  be  the  number  which  generates 
at  least  189  channel  bits.  This  can  range  from  as  many  as  500  or  more  during 
silence  to  fewer  than  80  during  voiced  speech.  The  SAMPLE  buffer  must  be 
large  enough  to  accommodate  this  kind  of  variation  without  introducing  undue 
delay. 

The  samples  removed  from  the  SAMPLE  buffer  in  a  given  block  time 
are  processed  by  the  PARC  algorithm  and  the  corresponding  quantizer  levels 

q(k)  are  generated.  The  N  samples  will  generate  N  quantizer  levels. 

B  b 

These  are  placed  in  the  LEVEL  buffer.  Since  PARC  operates  on  a  sample-by¬ 
sample  basis,  the  q(k)  are  loaded  into  their  buffer  one  at  a  time. 

The  source  coder  generates  full  blocks  of  189  bits.  Therefore,  it 
removes  quantizer  levels  from  the  LEVEL  buffer  at  one  time.  The  result¬ 
ing  block  of  bits  are  stored  in  the  bit  buffer.  They  are  clocked  out  of 
this  buffer  at  the  rate  of  9600  bits  per  second. 

The  ADAM  buffer,  as  all  double  buffers,  actually  has  two  half-buffers, 
each  of  which  holds  126  speech  samples.  While  the  ADAM  is  filling  one 
half-buffer  with  the  incoming  samples,  the  other  half-buffer  can  be 
emptied  into  the  SAMPLE  buffer.  The  SAMPLE  buffer  is  a  circular  buffer 
which  holds  up  to  1024  speech  samples.  There  are  two  pointers  associated 
with  the  buffer  which  keep  track  of  where  samples  are  to  enter  the  buffer 
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and  from  where  they  are  to  leave  the  buffer.  The  distance  between  the 
pointers  indicates  how  many  samples  are  currently  in  the  sample  buffer. 

Before  the  samples  are  removed  from  the  SAMPLE  buffer,  three  operations 
are  performed.  If  the  SAMPLE  buffer  is  over  a  pitched  repetition  threshold, 
it  will  signal  the  transmitter  to  do  the  pitched  repetition  as  described 
in  Section  2.6.  If  the  adaptive  low-pass  filter  is  called  for,  the  oldest 
80  samples  in  the  buffer  are  filtered.  The  original  samples  are  replaced 
with  the  filtered  samples.  The  8  and  T  values  for  de-pitching  are  then 
calculated  based  on  the  80  oldest  samples  in  the  buffer.  Samples  are  then 
passed  to  the  PARC  for  processing  and  the  corresponding  quantizer  levels 
are  put  into  the  LEVEL  buffer.  For  each  sample,  the  number  of  information 
bits  required  to  represent  the  quantizer  level  is  computed.  The  process 
will  stop  at  either  of  the  following  three  conditions: 

1.  The  transmitter  has  generated  the  required  157  information  bits. 

2.  The  transmitter  has  no  more  input  samples  to  process, i.e. ,  the  SAMPLE 
buffer  is  in  an  underflow  condition.  In  this  case,  the  transmitter  will 
output  a  NULL  code  to  the  source  encoder. 

3.  The  transmitter  has  processed  the  maximum  number  of  samples  allowed 
to  be  in  real-time. 

The  LEVEL  buffer  is  a  large  double  buffer  capable  of  holding  up  to 
1200  quantizer  levels.  Even  though  the  buffer  can  hold  1200  levels,  it 
will  never  have  more  than  two  blocks  each  of  which  generates  157  informa¬ 
tion  bits.  One  half-buffer  is  used  for  incoming  quantizer  levels,  while 
the  other  half-buffer  is  available  to  the  noiseless  source  coder.  The 
noiseless  source  coder  takes  these  quantizer  levels,  and,  with  the  asso¬ 
ciated  quantized  8  and  T,  generates  a  189-bit  block  which  is  placed  in  the 


BIT  buffer. 


The  BIT  buffer  is  a  double  buffer  which  holds  378  bits.  One  half¬ 
buffer  receives  bits  from  the  channel  coder,  while  the  other  half-bufier 
is  available  for  output  to  the  channel.  Output  takes  place  through  the 
Input/Output  Scroll  (IOS-2).  The  IOS-2  provides  the  bits  to  the  modem 
interface  contained  in  the  speech  interface  unit. 
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2 . 3  Adaptive  Low-Pass  Filter 

Adaptive  low-pass  filtering  is  used  in  the  system  to  provide  a  soft- 
failure  capability  under  certain  circumstances.  When  the  SAMPLE  buffer 
fills  more  rapidly  than  it  is  being  emptied,  for  example  during  voiced 
speech,  it  is  possible  that  it  could  overflow.  It  was  found  that  low-pass 
filtering  the  speech  mitigated  this  problem.  When  the  SAMPLE  buffer  is 
nearly  full,  therefore,  the  speech  is  low-pass  filtered  to  help  guard 
against  buffer  overflow. 

The  recommended  low-pass  filter  has  been  derived  from  a  first  order 
Butterworth  filter,  using  the  bilinear  transformation  to  obtain  a  digital 
filter.  The  general  form  of  the  transfer  function  of  this  type  of  filter 
is 

U)  +  U)  z 

H  (z)  =  - — - — - —  (2.2) 

(WCA+1)  +  (UJCA_1)Z~ 

where 

7T  f 

“ca  =  tan 

s 

f  =  cutoff  frequency  in  Hz. 
f  =  sampling  frequency  in  Hz. 

The  selected  parameters  are  f  =  1800Hz  and  fg  =  6400  Hz.  Transforming 
back  into  the  sample  domain,  the  filter  equation  can  be  written  as 


s^(k)  =  As(k)  +  As(k-l)  +  BSj(k-l) 


(2.3) 
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The  adaptive  low-pass  filtering  operates  by  checking  the  nu~' or  t ; 
samples  in  the  SAMPLE  buffer  just  prior  to  the  pitch  extraction  calcula¬ 
tions.  If  there  are  fewer  tha~  501  samples  in  the  SAMPLE  buffer  it  that 
tine,  the  low-pass  filtering  is  skipped.  If  there  are  501  or  more  samples 
j  in  the  same  buffer,  the  filtering  is  implemented.  Thus,  filtering  is  onlv 

used  when  buffer  overflow  is  threatened. 

|  When  filtering  is  called  for,  a  block  of  80  samples  are  low-pass 

filtered  and  the  original  samples  are  replaced  by  the  filtered  samples. 

■  A  block  of  N  samples  are  then  processed  by  PARC.  Note  that  may  be 

D  b 

less  than  80  so  that  some  filtered  samples  may  remain  in  the  buffer  and 
may  be  filtered  again  during  the  next  block  time. 

The  filtering  is  performed  in  blocks  of  80  for  two  reasons.  First, 
filtering  in  large  blocks  reduces  the  number  of  transitions  between 
filtered  and  unfiltered  speech.  Second,  the  larger  block  causes  some 
samples  to  be  multiple-filtered  if  the  sample  buffer  continues  to  fill. 

The  design  of  the  filter  causes  the  effect  of  this  multiple-filtering  to 
be  similar  to  filtering  with  a  lower  cutoff  frequency.  For  the  recommended 
parameters,  using  the  1800  Hz  cutoff  frequency  filter  twice  results  in  an 
i  overall  filter  with  a  cutoff  frequency  of  about  1350  Hz.  In  this  way,  the 

j  low-pass  filtering  is  automatically  increased  when  needed. 

I 

f 

-  \ z  y  ~  r  ^ — — — -  *  -  - - - * - —  . . . '  " 
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2 . 4  Pitch  Extraction 

It  is  well  known  that  voiced  speech  is  highly  correlated  from  pitch 
period  to  pitch  period.  A  long-term  prediction  of  s(.  (k)  given  bv 

s(k | k-T)  =  8sf(k-T)  (2.4) 

can  be  a  good  approximation  to  s^(k)  for  proper  choice  of  .<  and  T.  The 
optimum  scale  factor  8  depends  on  the  correlation  between  s^(k)  and  s^(k-T), 
and  the  best  T  is  an  estimate  of  the  pitch  period  measured  in  samples.  The 
goal  in  selecting  8  and  T  is  to  minimize  the  time  average  prediction  error 
over  a  block  of  K  samples 

E  =  £  I  [sf(j>  -  Bsf ( j-T) ] 2  (2.5) 

j=l 

In  most  applications,  8  and  T  are  computed  and  used  over  a  given  block 

of  K  samples.  This  procedure  is  modified  in  PARC.  New  values  of  6  and  T 

are  computed  each  block-time.  Therefore,  they  are  held  constant  for  Ng 

samples  and,  it  will  be  recalled,  N_  varies  from  block  to  block.  However, 

0 

it  is  not  possible  to  determine  N  until  after  6  and  T  have  been  chosen. 

0 

Therefore,  8  and  T  are  always  calculated  for  a  fixed  block  size  and 
used  for  a  variable  number  of  samples.  In  the  recommended  implementation, 
the  fixed  computation  block  is  K  =  80. 

The  pitch  period  estimate  T  is  computed  by  forming  the  Average 
Magnitude  Difference  Function  (AMDF)  and  picking  the  value  of  T  for  which 
this  function  is  minimized.  The  AMDF  function  is 

K 

A(T)  =  l  |sf(j)  -  s j (j-T) 

j=l 


(2.6) 


In  this  way,  the  value  of  T  selected  usually  matches  the  value  obtained  bv 
minimizing  the  error  E  in  Eq .  (2.5) but  with  far  fewer  computations.  Once  T 
has  been  found,  the  error  E  is  minimized  by  selecting  6  according  to 

K 

I  sf (j)sf ( j-T) 

e  =  ^ -  (2.7) 

l  s . 2 ( j-T ) 
j  =  l 

For  some  blocks  of  speech,  suchas  those  including  transition  regions 
from  silence  or  unvoiced  speech  to  voiced  speech,  the  value  of  S  given  bv 
Eq .  (2.7)  can  be  large.  The  use  of  such  large  values,  however,  can  actually 
decrease  the  overall  performance.  Therefore,  6  was  limited  to  the  range  of 
[-2,2].  The  B's  were  uniformly  quantized  over  this  range.  It  was  found 
that  system  performance  was  relatively  insensitive  to  quantization  noise 
on  B.  Therefore,  B  is  quantized  to  97  levels.  This  is  represented  in  the 
transmitted  block  with  seven  bits.  The  31  patterns  of  these  seven  bits 
which  do  not  represent  valid  8  values  are  used  for  another  purpose. 

By  extracting  pitch  from  different  sentences  and  different  speakers, 
it  was  found  that  the  pitch  period  T  varied  between  24  and  70  samples. 

Hence,  the  searching  range  was  chosen  to  be  between  20  and  83  yielding  64 
possible  values  of  T,  which  requires  an  8-bit  codeword  for  T. 
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2 . 5  Adaptive  Residual  Coder 

The  heart  of  the  PARC  system  is  an  Adaptive  Residual  Coder  (ARC). 

The  version  recommended  here  has  been  modified  to  optimize  its  operation 
as  a  part  of  PARC.  The  input  to  the  ARC  is  the  pitch-reduced  speech. 

v(k)  =  sf(k)  -  Ss(k-T)  (2.10) 

where  s(k-T)  is  the  reconstructed  version  of  s^(k-T)  available  at  both 
the  transmitter  and  receiver.  The  ARC  consists  of  two  principal  sub¬ 
systems:  an  adaptive  predictor  and  an  adaptive  quantizer.  These  will 

be  described  in  separate  subsections. 

2.5.1  Adaptive  Predictor 

The  adaptive  predictor  produces  a  linear  prediction  p(k)  given  by 

N 

p(k)  =  l  a  (k)  v(k-i)  (2.11) 

i=l  1 

which  is  to  be  an  estimate  of  v(k).  The  v(k-i)  are  the  receiver's 
estimate  of  v(k-i).  It  can  be  argued  that  the  predictor  order  N  should 
match  the  order  of  the  system  which  generates  the  v(k).  However,  pre¬ 
dictors  of  order  larger  than  4  yield  unsatisfactory  performance  in  the 
presence  of  channel  errors.  Therefore,  N=4  is  used. 

If  the  a^(k)  accurately  model  the  v(k),  and  if  the  v(k-i)  are  close 
to  the  v(k-i),  then  p(k)  will  be  a  good  approximation  to  v(k).  The  a^(k) 

are  adaptive,  and  after  p(k)  is  formed,  they  are  updated.  They  are  adapted 

2 

according  to  steepest  descent  of  e  (k) .  This  is  approximated  in  the 
system  by  the  following  updating  algorithm: 

a  (k+1)  =  6b  +  (1-6)  fa.(k)  +  &  e(k) 

1  1  <|v(k) |>2 


(2.12) 


where  <[v(k)j>  is  a  biased  exponential  time  average  of  v(k> 
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<|v(k)|>  =  (1-a)  ^  |v(k-j) |+RMSMIN 

j=0 

Thus,  the  a^(k)  updating  algorithm  has  eight  parameters:  5,  g,  RMSMIN 
and  for  i=l,2,3  and  4.  Three  of  them,  6,  g  and  a,  essentially  determine 
how  much  memory  there  is  in  the  updating  process.  In  order  to  minimize  the 
effect  of  channel  errors,  the  memory  time  was  reduced  from  what  would  be 
optimal  in  the  error-free  case.  This  did  not  significantly  degrade 
performance.  The  recommended  values  of  these  parameters  are 

6  =  0.01 

g  =  0.02 

a  =0.90 


The  parameters  b^  represent  the  quiescent  values  of  the  coefficients 
a^(k).  The  values  used  in  the  original  ARC  are  also  recommended  here. 


b 


i 


0.7 

0 


i=l 

i=2,  3  or  4 


The  quantity  RMSMIN  is  perhaps  the  most  sensitive  parameter  in  the 
algorithm.  It  determines  the  minimum  value  of  <|v(k)[>  which  affects 
both  the  adaptive  predictor  and  the  adaptive  quantizer.  The  lower  RMSMIN, 

the  more  the  system  responds  during  low  level  signals.  This  reduces 

granular  noise  and  increases  the  data  rate.  The  higher  data  rate  means 

that  the  sample  buffer  fills  faster  leading  to  more  low-pass  filtering 

and  increased  use  of  pitched  repetition.  The  value  selected  for  RMSMIN 
must  be  matched  to  the  dynamic  range  of  s(k).  When  s(k)  is  represented  on 
the  interval  (-2048,2047),  an  RMSMIN  of 


RMSMIN  =  50 

produces  a  good  tradeoff. 


V 


—  r- 
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2.5.2  Adaptive  Quantizer 

The  prediction  error  e(k)  is  the  input  to  the  adaptive  quantizer 
whose  basic  design  is  illustrated  in  Fig.  2.4.  The  input  is  normalized 
by  an  adaptive  scaling  factor  a(k)  and  the  result  is  compared  to  a  set 
of  thresholds  T^.  The  recommended  thresholds  are  symmetric  and  are 
illustrated  in  Fig.  2.5  and  listed  in  Table  2.1.  The  level  in  which  the 
normalized  input  falls  specifies  the  quantizer  output  q(k).  The  inverse 
quantizer  output  e(k)  is  the  quantized  version  of  the  quantizer  input.  It 
is  the  product  of  a  scaling  factor  f(q(k))  and  the  state  variable  o (k) . 

The  recommended  scale  factors  are  tabulated  in  Table  2.2.  The  recommended 
thresholds  were  computed  to  be  equidistant  between  the  scaling  factors. 

The  state  variable  o(k)  is  designed  to  be  an  approximation  to  the 
standard  deviation  of  e(k).  Most  of  the  time  the  scaled  average  of 

|v(k) |  is  an  acceptable  estimate.  However,  in  voiced  speech  at  the 
beginning  of  a  pitch  period,  e(k)  is  much  larger  than  usual.  Therefore, 

whenever  one  of  the  outermost  quantizer  level  occurs,  o(k)  is  signifi¬ 
cantly  increased.  If  no  further  outer  level  occurs,  o(k)  decays  back  to 
the  scaled  average  of  |v(k)|.  Thus,  a(k)  is  updated  by 

o(k)  =  max {SMIN  <|v(k)|>  ,  <j>  [q  (k)  ]o  (k-1)  }  (2.13) 

The  first  term  in  the  braces  of  Eq.  (2.13)  usually  dominates.  This 
means  that  the  quantizer  behavior  is  largely  determined  by  SMIN<|v(k)|> 
and,  hence,  by  the  product  of  SMIN  and  RMSMIN.  It  is  recommended  that 
the  scale  factor  SMIN  be  set  to  0.3. 

The  second  term  in  the  braces  only  affects  performance  at  the  beginning 
of  pitch  periods.  The  quantizer  expansion  factors  $[q(k)]  ate  given  in 


Table  2.2. 


Table  2.1  Quantizer  Thresholds 


T. 

l  l 


1 

0.90 

2 

3.03 

3 

5.38 

4 

7.75 

5 

10 

Table  2.2 

Quantizer  Scaling  Factors  and 
Expansion  Factors 

q(k) 

f(q(k)] 

<i>  [q(k)  ] 

1 

0.00 

0.7 

2,7 

+  1.80 

0.8 

3,8 

+  4.25 

0.9 

4,9 

+  6.50 

1.1 

5,10 

+  8.00 

1.5 

6,11 

+  12.00 

2.2 
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2. 6  Pitched  Repetition 


Even  with  adaptive  low-pass  filtering,  sections  of  voiced  speech  can 
generate  a  large  number  of  bits  rapidly.  This  could  cause  the  SAMPLE 
buffer  to  overflow.  To  avoid  the  overflow  problem,  the  system  uses 
a  technique  known  as  Pitched  Repetition.  When  overflow  is  imminent, 
a  block  of  samples  is  deleted  at  the  transmitter  and  replaced  at  the 
receiver  with  previous  reconstructed  speech.  The  replacement  samples 
are  the  reconstructed  samples  from  the  previous  pitch  period.  In  order 
to  have  an  absolute  control  on  the  overflow  condition,  the  repetition 
size  has  to  be  carefully  set.  During  a  voiced  region,  the  bit  rate  may 
be  as  high  as  3.4  bits  per  sample.  Thus,  the  repetition  size  of  80 
samples  is  enough  to  force  more  than  126  samples  to  be  transmitted  in  a 
time  slot,  i.e.,  the  total  number  of  output  samples  is  more  than  that  of 
input  samples.  The  decision  to  use  pitched  repetition  is  made  at  the 
beginning  of  each  block  and  a  special  signal  is  used  to  alert  the  receiver. 

If  the  SAMPLE  buffer  contains  more  than  850  samples  at  the  beginning 
of  a  block-time,  pitched  repetition  is  employed.  First,  the  pitch  period  T 
is  computed  in  the  usual  way.  The  output  pointer  in  the  SAMPLE  buffer 
is  then  moved  forward  80  samples.  Thus,  these  80  samples  will  not  be 
processed  by  PARC.  They  are,  however,  involved  in  the  6  calculation  which 
takes  place  with  the  output  pointer  at  its  new  location. 

The  receiver  must  still  produce  an  output  for  those  samples  which  are 
not  processed  by  PARC.  It  does  this  by  using  previous  outputs  delayed  by 
one  pitch  period.  Thus,  if  the  first  sample  skipped  is  s^(k),  it  is 
represented  at  the  receiver  by  s(k-T).  Similarly,  s^(k+l)  is  represented 
by  s(k+l-T)  and  so  on.  The  transmitter  must  know  this  since  it  uses 
s(k)  values  in  computing  v(k) .  Therefore,  the  s(k)  buffer  is  filled 


with  prior  s(k-T)  values  as  part  of  the  pitched  repetition.  After  this, 
PARC  resumes  normal  operation. 

This  method  of  repeating  short  intervals  of  samples  from  the 
previous  pitch  period  works  with  little  distortion  during  voiced  speech 

because  samples  one  pitch  period  apart  are  highly  correlated.  In  this 
way,  the  buffer  overflow  problem  is  overcome  with  a  minimum  of  additional 
distortion. 

The  receiver  must  be  informed  that  a  period  of  pitched  repetition 
is  taking  place.  As  detailed  in  the  next  section,  this  is  accomplished 
through  use  of  the  unused  values  of  g. 
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2.7  Noiseless  Source  Coder 

The  function  of  the  noiseless  source  coder  is  to  combine  all  of  the 
information  from  PARC  and  produce  the  corresponding  bit  stream  for  trans¬ 
mission.  The  output  of  the  coder  are  blocks  of  189  bits.  Each  block 

represents  one  set  of  values  of  8  and  T  and  N  level  variables.  In  addi- 

B 

tion,  the  block  provides  for  synchronization  and  error  control. 

The  format  of  a  typical  block  is  illustrated  in  Fig.  2.6.  The  block 
is  divided  into  three  63-bit  frames  to  facilitate  error  control.  The 
last  six  bits  of  each  frame,  denoted  El,  E2  and  E3  in  Fig.  2.6,  are  used 
for  error  correction.  A  single-error-correcting  (57,63)  Hamming  code  is 
used.  The  first  57  bits  in  each  frame  are  available  for  information.  The 
(57,63)  codes  can  each  correct  any  single  error  so  up  to  three  errors  per 
block  can  be  corrected. 

The  first  bit  in  the  block,  shown  as  Field  A,  is  for  block  synchroniza¬ 
tion.  It  is  set  to  0  in  odd  numbered  blocks  and  1  in  even  numbered  blocks. 
The  receiver  detects  this  pattern  and  knows  where  the  block  begins. 

The  next  field  in  the  block  is  six  bits  long  and  contains  information 
specifying  on  the  pitch  period  T.  The  pitch  period  is  constrained  to  be 
an  integer  between  20  and  83,  so  six  bits  can  transmit  the  pitch  period 
without  quantization  error. 

Field  C  in  a  normal  frame  is  seven  bits  long  and  contains  information 
on  the  pitch  correlation  coefficient  B.  It  was  found  that  8  can  be  quan¬ 
tized  fairly  coarsely  with  negligible  degradation,  so  that  97  possible 
values  in  the  range  of  [-2,2]  are  allowed.  This  means  that  97  of  the  pos¬ 
sible  128  values  of  Field  C  are  used  to  represent  6.  If  one  of  the  other 
31  patterns  appears,  it  indicates  that  this  is  not  a  normal  block.  Rather, 
it  is  one  using  pitched  repetition.  The  pitched  repetition  blocks  are 


discussed  later. 


Frame  A 


Bit  1i 


127  Frame  C  183  184  189 


Field 


Content 


A  Synchronization  Bit 

B  Pitch  Period  T 

C  Pitch  correlation  coefficient  B 

D1,I)2,D3  Quantizer  levels 

E1,E2,E3  Parity  bits 


Fig.  2.6  Normal  Block  Format 
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Following  the  £  field  in  a  normal  block  are  the  157  hits  representing 
the  quantiser  levels.  These  are  denoted  as  Fields  HI.  D2  and  D3.  The 
source  code  used  for  the  quantizer  levels  are  described  in  Table  2.1. 
Quantizer  levels  2  through  11  are  each  represented  bv  a  variable  length  hit 
pattern.  Quantizer  level  1,  however,  occurs  so  often  that  there  are  two 
ways  of  representing  it.  Isolated  occurrences  are  represented  bv  the 
1-bit  sequence  0.  If  level  1  occurs  14  times  in  a  row,  the  entire  string 
is  represented  by  the  sequence  1110.  Thus,  the  source  code  is  an  over¬ 
full  variable-length  to  variable-length  mapping. 

There  is  also  a  bit  pattern  associated  with  the  null  quantizer  level 
sequence;  this  is  used  to  fill  out  a  block  when  the  samples  in  the  sample 
buffer  do  not  generate  at  least  157  bits.  Because  of  the  variable  number 
of  bits  used  for  different  quantizer  level  sequences,  an  integral  number 
of  samples  will  not  always  generate  exactly  157  bits.  If  there  are  more 
than  157  bits  generated  by  a  set  of  samples,  the  excess  bits  are  the  first 
bits  transmitted  in  the  next  block's  quantizer  level  field. 

The  normal  format  described  above  is  used  under  most  circumstances. 

If  pitched  repetition  block  is  to  be  signaled  to  the  receiver,  however, 
the  block  format  is  changed  slightly  in  the  first  frame  of  63  bits,  as 
shown  in  Fig.  2.7.  Pitched  repetition  is  signaled  by  using  a  special  bit 
pattern,  a  "false  6",  in  the  field  C  which  is  usually  reserved  for  6.  The 
next  7  bits  are  then  taken  from  the  quantizer  level  field  Dl  to  create 
Field  C'which  transmits  the  actual  S.  Thus,  in  such  a  block,  only  150  bits 
are  used  for  quantizer  levels. 

To  protect  against  missing  a  "false  B",  or  deciding  one  in  error,  the 
bit  patterns  for  the  B  were  carefully  selected.  They  were  designed  so 
that  neither  situation  can  occur  due  to  a  single  bit  error.  The  "false  6” 


Table  2.3  Description  of  Quantizer 
Level  Source  Code 


Quantizer  Level  Bit 

Sequence  Sequence 

1  0 

7  100 

2  101 

8  1100 

3  1101 

14  l's  1110 

9  111100 

4  111101 

10  1111100 

5  1111101 

11  11111100 

6  11111101 


null 


11111110 


Bit  /' 


A  B _ C _ Cj _ ^ _ I  El 

12  67  14  15  21  21  57  58 


Field  Content 

A  Synchronization  bit 

B  Pitch  period  T 

C  False  B 

C'  Pitch  correlation  coefficient  S 

D  Quantizer  levels 

El  Parity  bits 


Fig.  2.7  First  Frame  During  Pitched  Repetition 
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is  represented  by  0000000.  The  7  patterns  with  a  single  one  and  the  21 
patterns  with  two  ones  are  never  transmitted.  Thus,  only  99  patterns  are 
available  for  true  8  values.  At  the  receiver,  if  field  C  contains  either 
all  zeros  or  a  single  1,  it  is  interpreted  as  the  "false  8" .  If  no  more 
than  one  channel  error  has  occurred,  this  will  happen  if  and  only  if  a 
"false  8"  was  actually  transmitted. 


’WAZ  y 
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2 . 8  Receiver 

The  principal  elements  of  the  receiver  are  illustrated  in  Fig.  2.8. 
The  received  bit  sequence  b(m)  is  monitored  by  a  synchronizer  which 
establishes  the  beginning  of  a  block.  The  decoder  transforms  the  bits 
in  a  block  into  the  quantizers  levels  q(k)  and  the  pitch  parameters  ? 
and  T.  These  are  then  processed  by  the  inverse  of  the  PARC  algorithm. 

The  s(k)  values  are  stored  in  a  variable  delay  receiver  buffer.  They 
are  clocked  out  of  the  buffer  and  interpolated  to  produce  the  un-sampled 
output  signal  s  The  primes  on  all  of  these  quantities  introduced 

earlier  to  account  for  possible  channel  errors  have  been  dropped  for 
notational  simplicity. 

The  synchronization  actually  operates  in  two  modes:  initial 
establishment  and  monitoring.  During  the  establishment  phase,  the 
rest  of  the  receiver  system  is  disabled.  The  synchronizer  looks  for 
a  sequence  of  bits  spaced  by  189  bit-times  whose  polarity  oscillates. 

When  a  sequence  of  10  bits  with  perfect  oscillation  is  found,  the 
synchronizer  decides  that  it  must  represent  the  sync  bit  in  10  succes¬ 
sive  189-bit  blocks.  The  rest  of  the  receiver  is  enabled  and  the 
synchronizer  changes  to  the  monitor  mode.  If  it  detects  a  significant 
number  of  errors  in  the  sync  bits,  it  assumes  block  synchronization  has 
been  lost.  The  receiver  is  disabled  and  the  synchronizer  returns  to 
the  synchronization  mode. 

Once  synchronization  has  been  established,  the  decoder  can  go  to 
work.  It  basically  inverts  the  operations  performed  by  the  noiseless 
source  coder.  Thus,  it  works  on  a  full  block  of  bits.  The  Hamming  codes 
are  decoded  and  any  correctable  bit  errors  are  corrected.  The  values  of  8 
and  T  are  found  and  the  sequence  of  quantizer  level  values  is  formed. 


Fig.  2.8  Receiver  Structure 


If  pitched  repetition  is  being  used,  a  special  code  is  set.  All  of  this 
information  is  placed  in  a  double-buffer  which  interfaces  with  the 
PARC  receiver. 

The  PARC  receiver  processes  a  full  block  of  data.  The  S  and  T 
values  are  fixed  for  the  block  but  the  number  of  samples  handled  is  the 
variable  N^.  If  pitched  repetition  is  called  for,  the  old  s(k)  values 
are  produced  before  the  ARC  receiver  is  enabled. 

The  output  s(k)  from  the  PARC  receiver  are  stored  in  the  circular 
receiver  buffer.  This  buffer  complements  the  variable  delay  transmitte’* 
buffer  and  requires  a  carefully  designed  control  algorithm.  If  there 
are  no  channel  errors,  the  total  delay  for  the  system  will  be  a  constant. 
Therefore,  the  sum  of  the  transmitter  delay  B^  and  the  receiver  delay  B.; 
will  be  fixed. 

The  buffer  control  logic  in  the  decoder  is  designed  to  prevent  the  re¬ 
ceived  sample  buffer  from  ever  overflowing  or  underflowing.  In  normal  opera¬ 
tion  neither  of  those  can  happen  but  channel  errors  can  add  or  delete  samples. 
Since  samples  are  removed  from  the  buffer  at  a  known  rate, the  buffer  control  logic 
will  always  know  how  many  samples  are  in  the  buffer.  If  there  is  not  enough 
room  in  the  receiver  buffer  for  all  of  the  s(k)  in  a  block,  the  excess  q(k) 
are  simply  discarded.  Since  a  full  receive  buffer  corresponds  to  an  empty 
transmitter  buffer,  this  usually  occurs  during  silence.  The  deletion  of 
silence  is  generally  not  a  problem. 

As  the  transmitter  buffer  fills,  the  receiver  buffer  empties.  If 
channel  errors  have  caused  the  deletion  of  samples,  it  is  possible  for  the 
receiver  to  run  out  of  s(k)  values.  This  underflow  condition  is  also 
prevented  by  the  buffer  control  logic.  If  the  number  of  s(k)  values  in  a 
block  plus  the  number  of  s(k)  stored  in  the  buffer  do  not  total  at  least  126, 


additional  q(k)  representing  silence  are  added  to  the  LEVEL  buffer.  Thus 
there  are  always  enough. 

The  upsampling  has  been  added  to  reduce  the  sampling  noise  caused 
by  the  non-ideal  nature  of  the  analog  output  filter.  The  6400  samp/sec 
sampling  rate  is  at  the  Nvquist  rate  of  the  3200  Hz  output  filter.  Thus, 
severe  aliasing  is  possible.  The  upsampler  effectively  increases  the 
sampling  rate  to  12800  samp/sec.  It  does  this  by  interpolating  between 
successive  s(k)  values.  It  was  found  that  linear  interpolation  was  suffi 
ciently  accurate  to  greatly  reduce  the  aliasing. 
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CHAPTER  3 
SYNCHRONIZATION 

Although  PARC  is  basically  a  sequential  algorithm,  the  use  of  pitch 
redundancy  reduction  and  error  control  forces  on  it  a  block  structure. 

The  transmitter  quantizes  a  block  of  ND  speech  samples  using  a  set  of 

D 

pitched  reduction  parameters  and  encodes  this  into  frames  of  binary  in¬ 
formation.  At  the  other  end,  the  receiver  must  identify  these  frames  to  be 
able  to  properly  decode  the  information  it  receives.  This  necessitates 
some  sort  of  synchronization  between  the  transmitter  and  the  receiver.  In 
this  chapter,  the  synchronization  technique  used  in  PARC  and  its  implementation 
is  described.  Further,  the  operation  is  analyzed  to  illustrate  its  satisfactory 
performance. 

There  are  two  aspects  to  the  synchronization  operation  performed  in 
the  receiver.  The  receiver  must  first  locate  the  frame  boundaries  in  the 
received  bit  stream.  This  is  called  synchronization  acquisition.  After 
acquisition,  it  must  monitor  the  frame  boundaries  on  a  continuing  basis  to 
ensure  that  sync  is  not  lost.  This  is  called  synchronization  monitor.  Con¬ 
tract  requirements  specify  that  bits  are  not  dropped  during  transmission. 
Therefore,  once  synchronization  is  properly  acquired,  there  should  be  no 
way  of  losing  it.  However,  there  are  several  reasons  for  providing  the  sync 
monitor.  Sync  acquisition  is  a  probabilistic  operation;  and  although  there 
is  a  high  probability  of  acquiring  sync  properly  in  one  attempt,  in  case  of 
a  wrong  decision,  the  sync  monitor  provides  a  way  for  re-attempting  sync 
acquisition.  There  are  other  abnormal  conditions  that  inevitably  occur 
during  algorithm  development  and  testing  which  can  also  cause  sync  to  be 
lost.  Some  of  these  are  bad  connections  on  the  digital  1/0  connectors,  faulty 
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IOS  operation  (see  Chapter  12),  reinitialization  of  the  algorithm  at  one 
end  of  the  communication  system.  All  these  make  it  imperative  to  provide 
the  algorithm  with  the  sync  monitor,  in  other  words,  with  re-syncing 
capabilities  to  ensure  proper  uninterrupted  operation. 
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3. 1  Sync  Algorithm  Description 

The  technique  selected  here  to  synchronize  the  receiver  and  the 
transmitter  is  similar  to  that  used  in  the  T1  Carrier  System.  A  bit 
pattern  called  the  synchronization  pattern  is  selected.  One  bit  at  a 
time  from  the  predetermined  pattern  is  interjected  at  regular  intervals 
into  the  bit  stream  generated  at  the  transmitter.  In  this  system,  the 
sync  bit  is  inserted  at  the  beginning  of  each  frame  of  189  bits  generated 
by  processing  a  block  of  Ng  samples.  The  receiver  tries  to  locate  the 
sync  pattern  embedded  in  the  received  bit  stream,  thereby  locating  the 
frame  boundaries. 

Any  bit  pattern  can  be  used  for  the  sync  pattern  as  long  as  it  does 
not  coincide  with  some  naturally  generated  pattern  at  the  transmitter. 

Using  a  shorter  sync  pattern  reduces  the  memory  space  and  computation  re¬ 
quired  at  the  receiver  during  sync  acquisition.  After  some  consideration, 
a  two  bit  pattern  01  was  selected  for  the  synchronization  pattern. 

The  following  subsections  detail  the  algorithm  for  the  two  aspects 
of  synchronization.  A  short  analysis  is  also  presented  with  each  to  get 
some  idea  of  the  performance  of  these  operations. 

3.1.1  Synchronization  Acquisition 

There  are  two  considerations  in  selecting  this  algorithm.  First,  it 
should  not  take  too  long  for  each  acquisition  operation.  And  secondly,  the 
probability  of  making  the  right  decision  should  be  reasonably  high  to  ensure 
that  the  right  synchronization  is  achieved  in  a  couple  of  attempts,  if  not 
in  one. 

The  sync  acquisition  algorithm  consists  of  segmenting  the  received  bit 
stream  into  blocks  of  189  bits  each.  One  of  the  189  bits  is  the  sync  bit. 
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and  the  corresponding  bit  position  follows  the  sync  pattern  over  the 
blocks  of  received  bits.  The  decoder  generates  the  sync  pattern  at  the 
receiver,  and  checks  its  correlation  with  each  of  the  189  bit  positions. 

The  position  that  correlates  exactly  with  the  sync  pattern  for  10  blocks 
is  picked  to  be  the  sync  bit,  marking  the  beginning  of  subsequent  frames. 

For  sync  acquisition  to  be  unambiguously  successful,  only  the  sync 
bit  of  the  189  bits  in  a  frame  must  correlate  exactly  with  the  sync  pattern 
for  10  blocks.  If  there  were  no  transmission  errors,  the  problem  here  would 
consist  of  picking  a  deterministic  sequence  from  the  midst  of  a  stochastic 
process.  However,  the  received  sync  pattern  is  corrupted  by  transmission 
errors,  an  average  error  rate  of  1%.  Each  of  the  other  188  bits  are  random 
l's  and  0’s,  and  they  correlate  with  the  sync  pattern  with  a  probability  of 
0.5.  With  these  and  the  assumption  that  the  channel  affects  the  bits  inde¬ 
pendently,  the  probability  of  making  a  correct  decision  about  sync  acquisition 
can  be  determined. 

The  probability  that  the  sync  bit  is  transmitted  without  errors  for  10 
consecutive  blocks  is 

a  *  (0.99)10  =  0.9044 

The  probability  that  one  of  the  other  bit  positions  correlates  perfectly 
with  the  sync  pattern  for  10  consecutive  blocks  is 

6  =  (0. 5) 10  =  0.000976 

The  probability  that  none  of  the  other  188  bit  positions  correlates  per- 

188 

fectly  with  the  sync  pattern  is  (1-B)  .  The  probability  of  an  unambiguous 

sync  acquisition  decision  is 

a(l-S)188  -  0.9044  x  0.8322  -  0.7526 
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Thus,  the  probability  of  successful  sync  acquisition  in  several 
attempts  can  be  computed: 

In  one  attempt,  P{sync  acquisition}  =  0.7526,  P(failure)  =  0.2474 
In  two  attempts,  Pfsync  acquisition}  =  0.9388,  P{failure}  =  0.0612 
In  three  attempts,  P{sync  acquisition}  =  0.9848,  P{failurel  =  0.0151 

The  implementation  of  this  algorithm  is  slightly  different  from 
the  description  here.  This  was  done  to  reduce  the  amount  of  computation 
required . 
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3.1.2 


Synchronization  Monitor 


There  are  three  considerations  in  selecting  the  algorithm  here. 

The  sync  monitor  checks  to  see  if  the  received  sync  bits  follow  the  sync 
pattern.  Because  of  transmission  errors,  there  are  some  errors  in  the 
received  sync  bits  inspite  of  the  error  control.  After  allowing  for  these 
errors,  the  algorithm  should  decide  that  sync  is  retained.  The  probability 
of  erroneously  deciding  that  sync  is  lost  should  be  extremely  small  to  en¬ 
sure  that  the  receiver  operates  uninterrupted  for  long  periods  of  time. 
Secondly,  if  sync  is  lost,  the  algorithm  should  realize  this  in  a  reasonably 
short  time.  And  finally,  the  algorithm  should  be  computationally  simple 
to  implement. 

With  these  considerations  in  mind,  the  following  algorithm  is  suggested. 
A  correlation  between  the  received  sync  bit  and  the  expected  sync  bit  Si 
is  computed.  Based  on  the  correlation,  a  value  is  assigned  to  a  r.v.  x^. 


x . 
1 


+2 

if 

Si* 

S 

-1 

if 

S.  = 

s 

This  variable  is  used  to  update  a  sync  variable  v^. 


If  at  any  time,  the  variable  v^  exceeds  the  threshold  T,  it  is  decided  that 
sync  has  been  lost.  The  variable  v^  starts  with  an  initial  value  0,  and  is 
constrained  to  be  non-negative  for  all  i.  If  its  value  drops  below  0,  it 
is  reinitialized  to  0.  The  threshold  T  used  here  is  12. 

This  algorithm  has  the  effect  of  switching  the  rate  of  change  of  the  sync 
variable  from  +2  when  (S^  f*  S^)  to  -1  when  (S^  *  S^).  If  the  channel  error 
rate  r  is  less  than  1/3,  the  rate  of  change  is  negative.  The  sync  variable 
decays  to  0  and  stays  there.  If  the  channel  error  rate  is  greater  than  1/3, 
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the  rate  of  change  is  positive  and  the  sync  variable  drifts  towards  the 
threshold  T.  The  maximum  positive  slope  is  2  and  occurs  when  the  channel 
error  rate  is  1. 

The  rate  of  change  is  a  binary  random  variable.  Over  a  period  of 
time,  its  average  is  the  slope  Av  of  the  sync  variable.  The  slope  Av 
can  be  used  to  determine  the  time  before  sync  is  lost. 

t  =  T/Av 

The  slope  Av  is  a  binomial  random  variable.  Its  function  t,  the  time  before 
sync  is  lost,  which  is  also  a  random  variable  can  be  described  by  a  dis¬ 
tribution  similar  to  the  binomial  distribution. 


P(t) 


r(t  +  1) _  r(t/3  +  4)  (1  _  r)  (2t/3  -  4)^  t  ^  6 

T (t/3  +  5)  r (2t/3  -  3) 


Its  expected  value  is 

<t>  *  T/(3r  -  1) 

Using  these,  some  estimates  of  the  performance  of  the  sync  monitor  algorithm 

can  be  obtained.  While  proper  sync  is  retained,  the  error  rate  for  the  sync 

bit  is  reduced  to  0.005  by  the  error  control.  At  this  error  rate,  the 

algorithm  would  -.ever  lose  sync.  If  the  error  rate  deviated  from  its  average 

value  to  1,  it  would  take  6  frames  for  sync  to  be  lost.  The  probability  of 

-14 

this  can  be  computed  to  be  1.5625  x  10 

If  sync  is  actually  lost,  the  average  error  rate  for  the  sync  bit  is 
0.5.  The  algorithm  would  take  an  average  of  20  frames  before  deciding  it  has 
lost  sync. 


The  implementations  of  these  two  algorithms  are  described  in  Chapter  12. 
The  sync  acquisition  algorithm  is  implemented  slightly  differently  from  its 


description  here  to  reduce  the  computation  involved.  The  sync  muni  tot- 
algorithm  is  implemented  as  described  here  using  simple  logical  and 
shift  operations. 


CHAPTER  H 


PITCH  EXTRACTION  STHPIES 


4 . 1  Introduction 

The  aim  of  efficient  coding  methods  is  to  reduce  the  channel  capacity 
required  to  transmit  a  signal  with  specified  fidelity.  To  achieve  this 
objective,  it  is  desirable  to  reduce  the  redundancy  of  the  transmitted 
signal.  One  well-known  procedure  for  removing  signal  redundancy  is  predic¬ 
tive  coding.  In  predictive  coding,  redundancy  is  removed  by  subtracting 
from  the  signal  that  part  which  can  be  predicted  from  its  past.  The  PARC 
system  is  essentially  APC  system  which  includes  a  pitch  extraction  loop  for 
long-term  redundancy  removal. 

In  this  chapter,  several  studies  concerning  pitch  redundancy  removal  in 
PARC  are  described.  The  correlation  technique,  as  well  as  AMDF  algorithm, 
for  pitch  extraction  is  outlined  in  Section  4.2.  The  complete  algorithm 
with  pitch  extraction  loop  was  simulated  on  a  digital  computer;  simulation 
results  are  discussed  in  Section  4.3. 


4 . 2  Pitch  Extraction  Algorithms 


It  is  well  known  that  voiced  speech  is  highly  correlated  from  pitch 
period  to  pitch  period  [1].  The  long  term  prediction  of  s(k)  is  given 
by 

s(kjk-T)  =  Bs(k-T)  (4.1) 

Here  £  is  a  scalar  which  depends  on  the  correlation  between  s(k)  and  s(k-T) 
while  T  is  an  estimate  of  the  pitch  period  (in  samples) .  The  use  of  d 
reflects  the  amplitude  changes  of  speech  signal  which  occur  from  period  to 
period  especially  during  the  beginning  and  end  of  the  voiced  segments.  For 
unvoiced  speech,  6  is  generally  small  and  long-term  prediction  is  relatively 
ineffective.  The  long-term  prediction  s(k[k-T)  is  subtracted  from  s(k)  to 
form  the  pitch-reduced-speech  v(k)  =  s(k)  -  s(k|k  -  T) . 

The  goal  in  selecting  B  and  T  is  to  minimize  the  error 

.  K 

E.  =  £  l  [s(j)-Bs(j-T) \  (4.2) 

1  K  j-1 

Here  block  adaptation  with  block  length  of  K  has  been  assumed.  The 
choice  of  K  depends  on  various  factors  and  will  be  discussed  in  a  next 
section.  The  derivative  of  with  respect  to  £  yields 

3E.  ~  K 

3g~  =  £  l  [s(j)  -Bs(j-T)]  s(j-T)  (4.3) 

j=l 

Equating  this  derivative  to  zero  and  solving  for  B  gives 

K 

l  s ( j ) s( j-T) 

B=^ _ 

l  s2(j-T) 


(4.4) 


If  this  result  is  substituted  in  Eq .  (4.2),  the  equation  becomes  function 
of  T  alone  given  by 

K  K 

El(T)  =  I  l  s  (j)  _  1  y  s(j)s(j-T) 

K  j=l  K  j=l 

— -  (4.5) 

£  2 
>  s  (j-T) 

j=l 

Therefore  to  minimize  E^  with  respect  to  T  it  is  necessary  to  maximize  the 
rightmost  term  of  equation  (4.5).  The  approach  used  was  to  compute 

K 

l  s(j)s(j-T) 

j=l 


l  s2(j-T)  (4.6) 

j-1 

for  all  values  of  T,  T  .  <  T  <  T  and  then  select  the  value  for  which 

mm  —  —  max 

A(T)  is  maximum.  The  lower  limit,  T  .  ,  of  the  search  range  was  selected 

min  ° 

to  be  smaller  than  minimum  value  of  pitch  periods  for  different  speakers 

while  the  upper  limit,  T  ,  is  influenced  by  various  factors  such  as 

max 

number  of  bits  available  for  transmission  of  T's,  processing  time  limita¬ 
tions  due  to  real  time  application  and  maximizing  energy  reduction. 

The  above  method,  though  simple  to  implement,  involves  extensive 
computation.  For  example,  if  the  block  length  is  K  and  searching  range 
is  R  then  for  each  value  of  R  there  are  2K.+2  multiplications  and  K 
additions.  Hence  the  total  number  of  multiplications  for  finding  the 
pitch  period  becomes  R(2K+2) ;  if  R=100  and  K=100  then  this  number  is  20200 
(this  is  just  for  one  block  of  100  samples.  This  many  multiplications 


consume  significant  processing  time  which  is  crucial  in  real  time  imple¬ 
mentation  in  Macro  Arithmetic  Processor  (MAP). 
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The  stringent  requirement  on  timing  in  the  MAP,  led  to  .1  modification 
of  the  above  correlation  technique  for  pitch  extraction.  This  is  done  bv 
forming  Average  Magnitude  Difference  Function  (AMDF) [ 2 ] 


K 

A' (T)  =  [  's(j)  -  s(j-T) |  (4.7) 

j=l 

T  .  <  T  <  T 

mm  —  —  max 


It  is  easy  to  see  that  for  any  periodic  function,  the  above  sum  is 
minimum  for  T  equal  to  the  period.  Hence,  the  pitch  parameter  T  was 
determined  by  minimizing  the  function  A1 (T)  with  respect  to  T.  The 
gain  parameter  was  obtained  by  substituting  this  value  of  T  into  Eq.  (4.4). 
The  computational  saving  in  this  method  is  apparent  since  there  is  no 
multiplication  involved. 

This  modification  of  correlation  method  gives  exactly  the  same  values 
of  T  (and  hence  of  8)  in  voiced  speech  but  differs  in  unvoiced  speech. 
However,  unvoiced  speech  is  non-periodic  and  T  is  arbitrary  and  hence  not 
important. 

Figure  4.1  shows  the  plot  of  8  and  T  against  speech  samples.  The 
correlation  coefficient  6  jumps  to  a  high  value  for  a  voiced  speech  block 
which  is  followed  by  silence.  This  high  value  amplifies  the  quantization 
noise  thus  decreasing  overall  signal  to  noise  ratio.  Limiting  6  to  [-2,2] 
was  found  to  eliminate  this  problem  and  give  satisfactory  performance. 
Further  discussion  of  limiting  incontext  with  quantization  of  8  and  coding 
is  presented  in  Chapter  11. 


-0.4 
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PARC  algorithm  was  simulated  on  PDP  11/60  computer.  The  following 
phonetically  balanced  sentences  were  chosen  for  the  simulation  work. 

Male  speaker:  sent  1  -"Cats  and  Dogs  each  hate  the  other." 
Female  speaker:  sent  11  -"The  pipe  began  to  rust  while  new." 

Beta's  and  T's  were  extracted  using  both  the  correlation  and  AMDF 
methods  discussed  above.  The  results  of  this  study  are  shown  in  Table  4.1. 


TABLE  A. la 


Comparison  of  Correlation  and  AMPF  Method 
for  Pitch  Extraction 


Sentence  11:  Female  Speaker 


r 

|  Sample  No. 

Correlation 

Technique 

AMDF  Algorithm 

S  i  T 

T 

3500 

-0.26 

r 

23 

0.06 

50 

3600 

-0.23 

25 

2.00 

134 

3700 

0.42 

22 

0.57 

147 

3800 

0.85 

58 

0.85 

58 

3900 

1.03 

29 

1.03 

29 

A000 

0.95 

29 

0.95 

29 

4100 

0.91 

88 

0.91 

88 

4200 

0.93 

88 

0.93 

88 

4300 

0.99 

29 

0.99 

29 

4400 

0.99 

29 

0.99 

29 

4500 

0.80 

29 

0.80 

29 

4600 

1.02 

29 

1.02 

29 

4700 

1.00 

29 

1.00 

29 

4800 

0.92 

29 

0.92 

29 

4900 

0.61 

28 

0.61 

28  : 

5000 

0.31 

25 

0.24 

89  | 

5100 

0.32 

20 

0.32 

20 

5200 

0.56 

94 

0.41 

120 

5300 

0.  30 

62 

0.  24 

23  1 

5400 

0.93 

27 

0.93 

27  i 

5500 

0.87 

29 

0.87 

29  ! 

5600 

0.89 

30 

0.89 

30  1 

5700 

0.99 

30 

0.99 

30  | 

5800 

1.01 

30 

1.01 

30  { 

5900 

1.01 

30 

1.01 

30 

6000 

1.07 

30 

1.07 

30 

6100 

1.07 

30 

1.07 

30 

6200 

1.08 

30 

1.08 

30 

6300 

1.04 

30 

1.04 

30  ; 

6400 

1.01 

30 

1.01 

30 

6500 

0.97 

30 

0.97 

30 

6600 

0.94 

30 

0.94 

30 

6700 

0.86 

30 

0.86 

30 

6800 

0.79 

30 

0.  79 

30 

6900 

0. 66 

31 

0.66 

31 

7000 

0.19 

63 

0.04 

27 

_ 

TABLE  4.1b 


Comparison  of  Correlation  and  AMDF  Method 
for  Pitch  Extraction 


Sentence  1:  Male  Speaker 


Sample 

No. 

Correlation 

Technique 

Sample 

No. 

AMDF  Algorithm 

6 

T 

e 

T 

1700 

0.53 

141 

1700 

0.37 

38 

1800 

0.  35 

141 

1800 

0.35 

141 

1900 

0.48 

135 

1900 

0.32 

22 

2000 

0.11 

148 

2000 

-0.25 

133 

2100 

0.30 

27 

2100 

0.20 

145 

2200 

1.45 

42 

2200 

1.45 

42 

2300 

0.71 

48 

2300 

0.71 

48 

2400 

0.76 

50 

2400 

0.76 

50 

2500 

0.93 

52 

2500 

0.93 

52 

2600 

0.74 

53 

2600 

0.74 

53 

2700 

0.96 

53 

2700 

0.96 

53 

2800 

0.94 

53 

2800 

0.94 

53 

2900 

0.94 

54 

2900 

0.94 

54 

3000 

0.94 

55 

3000 

0.94 

55 

3100 

0.90 

57 

3100 

0.90 

57 

3200 

0.72 

59 

3200 

0.72 

59 

3300 

0.60 

61 

3300 

0.60 

61 

3400 

0.54 

62 

3400 

0.52 

63 

3500 

0.79 

24 

3500 

-0.98 

31 

3600 

1.16 

52 

3600 

1.16 

52 

3700 

1.21 

53 

3700 

1.21 

53 

3800 

1.01 

53 

3800 

1.01 

53 

3900 

0.89 

53 

3900 

0.89 

53 

4000 

0.81 

55 

4000 

0.81 

55 

4100 

0.97 

57 

4100 

0.97 

57 

4200 

0.98 

58 

4200 

0.98 

58 

4300 

1.00 

58 

4300 

1.00 

58 

4400 

0.96 

59 

4400 

0.96 

59 

4500 

0.85 

59 

4500 

0.86 

60 

4600 

0.92 

60 

4600 

0.92 

60 

4700 

0.84 

61 

4700 

0.84 

61 

4800 

0.74 

62 

4800 

0.75 

63 

4900 

0.69 

65 

4900 

0.69 

65 

5000 

0.60 

66 

5000 

0.60 

66 

5100 

0.60 

68 

5100 

0.61 

67 

5200 

0.42 

65 

5200 

0.44 

67 

5300 

0.93 

64 

5300 

0.93 

64 

5400 

0.71 

64 

5400 

0.71 

64 

5500 

0.87 

65 

5500 

0.88 

66 

5600 

0.66 

131 

5600 

0.66 

131 

5700 

0.33 

66 

5700 

0.34 

67 

5800 

0.32 

62 

5800 

0.32 

62 

4.3  Redundancy  Removal 


As  described  earlier,  the  APC  system  is  based  on  the  removal  of  two 
kinds  of  redundancy:  short  term  redundancy  caused  by  vocal  tract  filter 
and  long-term  redundancy  caused  by  pitch  frequency.  Once  the  pitch  peri.  :  ' 
and  gain  parameter  R  are  determined,  reduced  speech  is  formed  as 

v(k)  =  s(k)  -  6  s(k|k-T)  (4. Si 

where  s(k|k-T)  represents  the  reconstructed  speech  sample.  Figure  4.2 
shows  the  plot  of  reduced  and  original  speech.  It  is  easy  to  notice  the 
energy  reduction  achieved  in  the  almost  periodic  voiced  portion  of  the 
speech.  The  amount  of  energy  reduction  achieved  is  expressed  by  SER 
(Signal  Energy  Reduction)  which  is  calculated  as 

l  v2(k) 

SER  =  -10  log  - 5, -  (4.9) 

10  l  s2(k) 

As  the  value  of  signal  energy  reduction  is  increased  the  dynamic  range  of 
the  input  signal  to  quantizer  is  reduced;  hence  the  reduced  speech  signal 
can  be  represented  by  the  lower  quantizer  levels  thus  requiring  fewer  bits 
for  transmission.  The  SER  can  be  increased  by  accurately  picking  pitch 
period  T  and  choosing  the  block  size  such  that  effect  of  transition  of  S 
from  block  to  block  is  minimum.  The  parameters  6  and  T  are  associated  with 
every  block.  For  smaller  block  sizes,  the  number  of  parameters  to  be 
transmitted  per  second  is  increased.  However,  for  smaller  block  size,  the 
amplitude  variations  are  closely  represented  by  S  and  transition  of  6  from 
block  to  block  is  smooth.  These  factors  contribute  to  improve  SER. 

Table  4.2  shows  the  effect  of  block-size  variation  on  SER.  The 
following  performance  measures  were  also  computed: 


Table  4.2 


Effect  of  Block  Size  on  Various  Performance  Measures 


Sentence 


Block  SER  Overall  Inloop  Entropy 

Size  SNR  SNR  bits/sample 


20 

8. 

.  47 

db 

20. 

,81 

db 

12. 

34 

db 

1. 

.  53 

40 

7. 

,07 

db 

20. 

.24 

db 

13. 

17 

db 

1. 

.48 

80 

5. 

,63 

db 

19. 

.08 

db 

13. 

45 

db 

1  . 

.46 

100 

5. 

,44 

db 

18. 

.97 

db 

13. 

53 

db 

1, 

.43 

120 

5. 

,15 

db 

18. 

,85 

db 

13. 

70 

db 

1, 

.43 

140 

4, 

.70 

db 

18. 

.26 

db 

13. 

56 

db 

1, 

.44 

160 

4, 

,56 

db 

18. 

,33 

db 

13. 

77 

db 

1. 

.45 

180 

4. 

,41 

db 

18. 

.31 

db 

13. 

90 

db 

1. 

.42 

200 

4. 

,07 

db 

18. 

.20 

db 

14. 

13 

db 

1. 

.43 

20 

11.19 

db 

22.  78 

db 

11.59 

db 

1.48 

40 

9.81 

db 

22.44 

db 

12.63 

db 

1.44 

8.48 

db 

21.72 

db 

13.24 

db 

1.43 

120 

8.04 

db 

21.65 

db 

13.61 

db 

1.42 

140 

8.20 

db 

21.57 

db 

13.37 

db 

1.42 

160 

7.92 

db 

21.59 

db 

13.67 

db 

1.43 

180 

8.16 

db 

21.64 

db 

13.48 

db 

1.43 

200 

8.33 

db 

21.58 

db 

13.25 

db 

1.44 

58 


I  s2(k) 

Overall  SNR  =  10  log  -  ?  (A. 10) 

1U[[s(k)  -  s(k)P 

l  v"(k) 

Inloop  SNR  =  10  log..  ~  ~  ~  (4.11) 

£[v(k)  -  v(k)] 

where  v(k)  =  v(k)  +  n  (k) 

q 

11 

Entropy  H  =  -  £  p.  log  p.  (4.12) 

i=l  1 

where  =  Probability  of  occurrence  of  iC^  quantizer  level. 

As  SER  improves  overall  performance  also  improves.  It  is  interesting  to 
note  that  the  SNR  (overall)  may  increase  even  if  the  SNR  (inloop)  decreases 
because  of  the  improvement  in  SER.  The  speech  signal  spectrum  becomes 
flatter  because  of  pitch  extraction  thus  adversely  affecting  the  perform¬ 
ance  of  predictor  in  INLOOP.  However,  because  of  smaller  dynamic  range  of 
reduced  speech,  the  quantizer  noise  is  decreased  which  more  than  compensates 
for  the  poor  performance  of  predictor.  Hence  the  overall  performance  im¬ 
proves  . 

The  searching  range  (T  -  T  .  )  for  pitch  extraction  also  affects 

max  min  r 

the  redundancy  removal.  It  was  observed  that  a  longer  searching  range 
gives  better  SER  while  a  small  value  of  searching  range  decreases  SER  by 
as  much  as  2  db.  A  searching  range  of  the  order  of  twice  the  maximum  pitch 
period  appears  to  be  sufficient  However,  the  longer  searching  range  also 
means  more  computations  and  hence  more  CPU  time.  In  the  real-time  simula¬ 
tions  on  the  MAP,  timing  is  critical  and  therefore  the  number  of  computations 
and  memory  transfers  need  to  be  reduced.  In  such  cases  the  searching  range 
must  be  reduced  to  achieve  a  compromise  between  the  number  of  computations 


r  ? 


-w-  , 
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I 

l 

and  the  reduction  in  SER  that  can  be  tolerated. 

It  was  noticed  that  the  pitch  extraction  algorithm  sometimes  picks 
'  double  or  triple  pitch  periods.  This  fact  has  only  a  modest  effect 

redundancy  removal.  However,  the  transmission  of  double  or  triple  pitch 
period  values  may  require  the  allocation  of  more  bits  for  transmission  of  T. 
Again,  it  is  desirable  to  limit  the  search  range. 

In  Section  A. 2,  the  estimation  algorithm  used  to  compute  the  pitch 
period  T  and  long  term  gain  is  based  solely  on  the  original  speech.  In 
’  fact,  as  seen  by  examining  Fig.  2.10  the  long-term  redundancy  removal 

operation  actually  subtracts  the  reconstructed  speech  from  original  speech. 
In  an  attempt  to  compensate  for  this  fact,  the  I;  obtained  from  Eq.  (A. 4) 
was  modified  by  multiplying  by  scalar  a  as 

* 

& 

Here  a  can  be  expressed  [3]  as 

a 

where  a  is  the  inverse  of  signal 
varied  between  0  and  1.2  with  no 
Table  A. 3. 


a  S 


(A. 13) 


1  +  a 


(A.1A) 


to  noise  ratio.  The  parameter  a  was 
significant  improvement  was  noticed.  See 


f  7 


6U 


TABLE  4.3 

The  Effect  of  a  on  SER  and  Overall  SNR 

Overall 

a  SER  SNR 


0 

0 

db 

15. 

61 

db 

0.9 

5.45 

db 

19. 

23 

db 

0.94 

5.55 

db 

19. 

29 

db 

0.96 

5.57 

db 

19. 

24 

db 

0.98 

5.55 

db 

19. 

40 

db 

0.995 

5.52 

db 

19. 

28 

db 

1.0 

5.55 

db 

19. 

31 

db 

1.04 

5.48 

db 

19. 

16 

db 

1.1 

5.37 

db 

19. 

21 

db 

1.2 

5.07 

db 

18. 

79 

db 
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CHAPTER  5 
TREE  CODING 

5 . 1  Introduction 

Tiic  concept  known  as  tree  coding  was  investigated  as  part  of 
this  study  to  evaluate  its  ability  to  improve  performance.  One  par¬ 
ticular  algorithm,  the  (M,L)  algorithm,  was  the  basis  for  most  of  the 
investigation.  A  modified  version  of  the  (M,L)  algorithm  called 
adaptive  tree  searching  was  developed  and  investigated.  Simulations 
indicated  that  adaptive  tree  searching  marginally  improved  the  per¬ 
formance  of  the  PARC  algorithm. 


5.2  The  (M,L)  Algorithm 


5.2.1  Description 

The  (M,L)  algorithm  is  one  of  a  number  of  algorithms  which  perform 
what  is  known  as  tree  coding.  The  main  idea  of  tree  coding  is  to  defer 
making  a  decision,  in  this  case,  which  quantizer  level  should  be  used  for 
a  given  sample,  until  a  later  time  when  it  can  be  made  in  light  of  that 
which  follows.  Tree  coding  is  useful  in  predictive  or  backward  adaptive 
quantization  systems,  because  the  selection  of  a  quantizer  level  affects 
the  selection  of  quantizer  levels  in  the  future.  This  effect  can  be 
represented  graphically  in  the  form  of  a  tree,  where  a  node  represents 
the  "state"  of  the  system  as  a  result  of  selecting  the  sequence  of 
quantizer  levels  leading  to  that  "state",  with  a  branch  connecting  the 
node  to  the  node  representing  the  previous  "state".  The  tree  is  rooted 
by  an  arbitrary  "state"  at  an  arbitrary  time,  and  evolves  in  time,  with 
a  new  level  of  nodes  added  at  each  sample  time. 

The  (M,L)  algorithm  operates  in  the  following  way,  and  is  illustrated 
in  Figure  5.1.  At  a  given  time,  let  us  say  that  there  are  n  nodes  in  the 
outermost  level  of  the  tree,  and  a  new  sample  is  to  be  quantized.  A  new 
level  of  nodes  are  then  "grown"  from  the  outermost  level,  representing  all 
of  the  new  possible  "states".  Thus,  if  there  are  k  quantizer  levels, 
there  will  be  kn  nodes  in  the  new  level.  These  new  nodes  are  then  ranked 
by  some  performance  criterion,  such  as  quantization  noise.  Nodes  are 
next  pruned  from  the  tree.  This  is  a  key  step,  because  it  is  this  that 
prevents  the  task  from  growing  exponentially.  In  order  to  insure  that  a 
quantizer  level  is  selected  for  a  sample  in  a  finite  amount  of  time,  a 
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quantizer  level  decision  is  forced,  if  necessary,  for  the  sample  L 
time  units  ago,  by  picking  the  predecessor  node,  in  the  level  L-levels 
in,  of  the  best  of  the  new  nodes.  All  branches  which  do  not  stem  from 
that  predecessor  are  pruned.  If  more  than  M  new  nodes  remain,  the  M 
best  nodes  are  kept,  and  the  rest  are  pruned.  The  process  then  starts 


The  (M,L)  algorithm  has  been  studied  by  a  number  of  investigators 
in  connection  with  simpler  quantization  schemes,  such  as  adaptive  delta 
modulation  (ADM)  and  adaptive  differential  pulse  code  modulation  (ADPCM) . 

For  example,  Jayant  and  Christensen  reported  a  3JB  improvement  in  the 
s ignal-to-noise  ratio  (SNR)  using  the  (M,L)  algorithm  with  simple  ADM 
and  ADPCM  schemes,  with  M=4  and  L=7.  In  contrast,  our  investigation 
concentrated  on  the  use  of  the  (M,L)  algorithm  in  connection  with  adaptive 
residual  coding  (ARC).  It  was  originally  envisioned  that  the  resulting 
algorithm  would  be  embedded  within  a  pitch  extraction  loop.  Some  pre¬ 
liminary  work  was  done,  however,  without  the  pitch  extraction  loop  in 
order  to  verify  the  tree  coding  software  and  to  gain  some  insight  into 
its  operation.  For  example,  by  using  a  four  level  fixed  quantizer  and 
a  third  order  fixed  predictor  in  the  ARC  algorithm,  an  improvement  of 
3dB  in  the  SNR  was  achieved  for  M=4  and  L=7.  Unfortunately,  the  results 
obtained  using  the  standard  ARC  algorithm  did  not  show  as  great  an  im¬ 
provement  in  SNR. 

In  order  to  make  the  following  results  more  understandable  though, 
a  comment  is  necessary  here.  One  of  the  key  elements  of  the  ARC  algorithm 
is  the  source  coding,  which  translates  quantizer  levels  into  bit  patterns. 
The  selection  of  a  good  source  code  is  dependent,  however,  on  the  statistics 
of  that  which  is  to  be  encoded.  As  a  result,  the  performance  of  the  (M,L) 
algorithm  was  evaluated  without  use  of  a  source  coder.  Instead,  the 
entropy  of  the  quantizer  levels  was  computed,  which  allows  for  a  fair 
comparison  between  the  possibilities.  In  general,  a  higher  entropy  allows 
better  performance. 


Some  of  the  first  results  obtained  were  for  the  first  second  of  Sentence 
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I,  "Cats  and  dogs  each  hate  each  other".  In  this  series  of  runs,  a 
five  level  quantizer  was  used  in  the  standard  ARC  algorithm,  M  was  set 
to  five,  and  L  was  varied.  The  results  are  shown  in  Table  5.1.  These 
results  illustrate  the  typical  effect  of  tree  coding:  increasing  L 
improves  performance  by  increasing  the  SNR  and  decreasing  the  entropy 
simultaneously.  The  results  also  illustrate  another  phenomenon  of  tree 
coding:  that  tree  coding  becomes  less  effective  in  improving  performance 

as  the  performance  of  the  base  algorithm  improves. 

More  results  were  obtained  for  the  first  second  of  Sentence  1  using 
a  19  level  quantizer.  The  results  are  shown  in  Table  5.2.  These  results 
again  show  that,  in  general,  increasing  L  or  M  (or  both)  improves  per¬ 
formance,  but  that  the  amount  of  improvement  is  smaller  than  that  achieveable 
by  the  poorer-performing  five  level  quantizer.  There  are,  however,  several 
instances  where  it  appears  that  increasing  L  or  M  has  decreased  performance. 
This  can  occur  because  tree  coding  can  make  a  suboptimal  decision  be¬ 
cause  the  set  of  possible  decisions  is  deliberately  limited.  As  a  result, 
tree  coding  can  occasionally  get  "fooled". 

Simulations  were  also  performed  with  the  tree  coding  ARC  algorithm 
embedded  within  a  pitch  extraction  loop.  Some  typical  results  are  shown 
in  Table  5.3.  These  results  were  obtained  from  Sentence  1,  using  an  11 
level  quantizer  and  M=5.  Performance  is  again  generally  improved  by  tree 
coding,  but  by  only  a  small  amount. 


Table  5.1.  Results  for  a  five  level  quantizer,  M=5. 
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_L_ 

SNR  (dB) 

ENTROPY  (BITS/SAMPLE) 

1 

14.03 

1.423 

2 

14.60 

1.421 

4 

14. 66 

1.407 

8 

14.98 

1.403 

16 

15.08 

1.393 

Table  5.2.  Results  for  a  19  level  quantizer. 


1 

2 

3 

4 

5 

6 

7 

8 

9 

-  _ 

10 

21.30 

2.263 

21.30 

2.262 

21.30 

2.262 

21.30 

2.263 

21.45 

2.258 

21.57 

2.250 

21.69 

2.250 

21.61 

2.265 

21.88 

2.244 

21.89 

2.242 

21.89 

2.248 

21.71 

2.245 

21.80 

2.246 

21.30 

2.263 

21.45 

2.258 

21.57 

2.250 

21.72 

2.246 

21.90 

2.249 

21.45 

2.258 

21.63 

2.248 

21.63 

2.243 

21.89 

2.244 

21.44 

2.257 

21.64 

2.253 

_ 

21.75 

2.244 

_ 

21.75 

2.243 

i 

t  - 


Key:  upper  number  is  SNR  (dB),  lower  number 
is  entropy  (bits/sample) 
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Table  5.3.  Results  for  an  11  level  quantizer,  with 
M=5,  embedded  in  a  pitch  removal  loop. 


L 

SNR 

inside  pitch  loop  (dB) 

SNR 

overall  (dB) 

Entropy 

(bits/sampL 

1 

13.75 

18.41 

1.372 

2 

14.01 

18.63 

1.355 

3 

14.29 

18.89 

1.369 

4 

14.36 

18.97 

1.353 

5 

14.22 

18.83 

1.350 

6 

14.42 

19.05 

1.353 

~m*z  s - *■ 


u 


TT 


5 .  3  Adaptive  Tree  Coding 


5.3.1  Description 

Adaptive  tree  coding  was  developed  as  an  attempt  to  gain  the 
improvement  in  performance  of  the  (M,L)  algorithm,  without  using  as 
many  computations.  It  was  felt  that  the  performance  improvement  was 
desirable,  but  the  large  number  of  computations  was  a  costly  trade-off, 
and,  more  importantly,  unable  to  be  done  in  real  time.  This  lead  to  the 
search  for  some  way  of  improving  the  performance/computation  ratio. 

Adaptive  tree  coding  was  the  result  of  that  search.  It  is  based  on  the 
fact  that  the  receiver  does  not  need  to  know  to  what  extent,  if  any,  that 
tree  coding  is  being  performed  at  the  transmitter.  The  receiver  acts  only 
upon  the  quantizer  levels  it  receives;  it  does  not  matter  how  those  quantizer 
levels  were  arrived  at.  So  the  basic  concept  of  adaptive  tree  coding  is 
to  use  tree  coding  only  when  it  appears  to  make  sense. 

Two  strategies  for  adaptive  tree  coding  were  developed  from  this  basic 
concept.  The  first  strategy  was  to  perform  additional  tree  pruning,  so 
that  a  node  would  have  to  have  a  reasonable  chance  of  being  selected  to  be 
kept.  Specifically,  a  node  would  be  pruned  if  the  value  of  the  criterion 
for  it  were  worse  than  the  value  of  the  criterion  for  the  best  node,  multiplied 
by  an  arbitrary  factor.  In  this  way,  when  growing  the  next  level  of  nodes, 
time  would  not  be  spent  growing  nodes  which  had  a  high  probability  of  being 
pruned  eventually. 

The  second  strategy  was  to  "turn  off"  the  coding  (by  setting  M  equal  to 
1)  when  the  system  was  performing  well,  and  "turn  it  back  on"  when  it  was  not 
performing  well.  The  idea  behind  this  strategy  was  that  if  the  system  were 
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performing  well,  there  was  little  to  be  gained  from  tree  coding.  This 
strategy  was  implemented  by  checking  the  value  of  the  criterion  for  the 
best  new  node  was  better  than  some  arbitrary  threshold.  If,  on  the  other 
hand,  tree  coding  was  not  in  use,  then  tree  coding  would  not  be  used  until 
the  value  of  the  criterion  for  the  best  new  node  was  worse  than  some 
second  arbitrary  threshold. 

Both  of  these  strategies  were  employed  in  our  simulation  of  adaptive 
tree  coding,  because  they  are  somewhat  complementary  in  nature:  the  second 
strategy  provides  "course  tuning",  and  the  first  strategy  provides  "fine 
tuning" . 


5.3.2  Results 


Results  were  first  obtained  for  adaptive  tree  coding  using  only  the 
second  ("threshold")  strategy.  Some  typical  results  from  these  runs  are 
shown  in  Table  5.4.  Tree  coding  was  used  for  26%  of  the  samples  in 
Sentence  1,  and  for  14%  of  the  samples  in  Sentence  11.  The  results 
indicate  the  ability  of  adaptive  tree  coding  to  improve  performance  with 
a  small  increase  in  computation. 

Results  were  next  obtained  for  adaptive  tree  coding  using  only  the 
first  ("factor")  strategy.  A  sample  of  the  results  are  shown  in  Table  5.5. 
On  these  runs.  Sentence  1  was  used  as  the  input,  and  M  was  set  to  5.  A 
measure  of  the  decrease  in  computations  is  "effective  M" ,  which  is  the 
average  number  of  nodes  retained  for  each  new  sample.  It  can  clearly  be 
seen  that  the  number  of  calculations  can  be  decreased  dramatically  while 
still  retaining  much  of  the  increased  performance.  In  fact,  it  can  be 
seen  from  the  data  that  the  performance  was  improved  by  adaptive  tree 
coding.  One  possible  explanation  for  this  phenomenon  would  be  that  for 
those  cases  where  adaptive  tree  coding  increased  performance,  that  the 
additional  pruning  eliminated  nodes  which  appeared  to  be  good,  but  were 
not  in  the  long  run. 

Finally,  a  series  of  simulation  runs  were  made  which  utilized  both 
adaptive  strategies  simultaneously.  The  results  indicated  that  the  two 
strategies  were  complementary,  in  that  usa  of  both  was  better  than  the  use 
of  either  alone.  Some  representative  results  are  shown  in  Table  5.6.  It 
can  be  seen  that  about  the  same  performance  is  obtained  by  adaptive  tree 
coding  as  with  tree  coding,  with  about  a  66%  reduction  in  computations. 
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Table  5.4  Results  using  the  "threshold" 
strategy  for  adaptive  tree 
coding. 

Sentence  1  (male  speaker): 


Adaptive 


PARC 

tree  coding 

Tree  codin'.; 

SNR  inside  pitch  loop  (dB) 

13.77 

14.13 

14.42 

SNR  overall  (dB) 

18.42 

18.75 

19.04 

Entropy  (bits/sample) 

1. 374 

1.374 

1.356 

Sentence  11  (female  speaker): 


PARC 

Adaptive 
tree  coding 

Tree  coding 

SNR  inside  pitch  loop  (dB) 

12.84 

13.35 

13.64 

SNR  overall  (dB) 

20.88 

21.40 

21.70 

Entropy  (bits/sample) 

1.386 

1.374 

1.338 

j 


Table  5.5  Results  using  the  "factor" 
strategy  for  adaptive  tree 
coding. 


FACTOR 

Overall 

SNR(dB) 

Entropy 

(bits/sample) 

Effect 

M 

1.E6  (60dB) 

18.92 

1.347 

5.00 

10. 00(10dB) 

18.92 

1.347 

4.46 

7. 94 (9dB) 

18.92 

1.348 

4.43 

6. 31 (8dB) 

18.92 

1.348 

4.39 

5. 01 (7dB) 

18.92 

1.348 

4.33 

3.  98 (6dB) 

18.92 

1.348 

4.24 

3 . 1 6  (  5d  B  ) 

18.94 

1.349 

4.10 

2. 5l(4dB) 

19.09 

1.343 

3.84 

2. 00(3dB) 

19.08 

1.348 

3.41 

1.  58(2dB) 

19.07 

1.356 

2.62 

1.50(1. 75dB) 

19.05 

1.354 

2.40 

1.41(1. 50dB) 

18.94 

1.360 

2.10 

1.  33(1.  25dB) 

18.94 

1. 347 

1.85 

1 . 26(ldB) 

18.82 

1.354 

1.58 

1.00(0dB) 

18.42 

1.374 

1.00 

Table  5.6  Results  using  both  strategies  for 
adaptive  tree  searching. 


Ove  ra 1 1 
SNR(dB) 

PARC  18.42 


Adaptive  tree  coding 

HITHR  =  2416  (15dB) ,  18.92 

LOTHR  =  304.1  (24dB) , 

FACTOR  =1.50  (1. 75dB) 


Tree  coding  18.92 


Entropy  E 

(bits/sample) 

1 .  374 


1.360 


1.347 


5 . 4  Conclusions  and  suggestions  for  further  research 


It  would  appear  that  on  the  basis  of  the  results  obtained,  tree 
coding  is  an  effective  way  to  increase  the  performance  of  a  quantization 
system.  The  major  question  though,  is  not  whether  it  is  effective, 
but  how  effective  it  is  relative  to  what  it  costs.  It  would  appear  that 
tree  coding  may  not  be  effective  in  this  sense,  but  that  some  form  of 
adaptive  tree  coding  may  be.  The  question  was  moot  for  this  project, 
because  there  was  not  sufficient  real  time  to  perform  a  tree  coding 
version  of  PARC. 

The  reason  why  tree  coding  appears  to  be  relatively  ineffective  with 

PARC  may  be  that  the  ARC  algorithm  has  been  overly  adapted  for  PARC  -  that 

is  a  set  of  parameters  optimized  for  PARC  will  most  probably  be  a  poor 
choice  to  be  used  with  tree  coding.  This  makes  a  lot  of  sense  if  you 
think  about  how  tree  coding  can  help  a  system.  For  the  PARC  algorithm  to 
work  well,  the  parameters  are  chosen  so  that  typically  only  one  quantizer 
level  results  in  a  reasonable  amount  of  error  —  to  do  otherwise  would  be 
suboptimal  for  PARC,  since  only  one  quantizer  level  can  be  selected,  and 
having  more  than  one  reasonable  quantizer  level  would  simply  reduce  the 
dynamic  range  of  the  quantizer  or  increase  the  average  quantization  error. 

In  contrast,  for  tree  coding  to  be  effective,  several  quantizer  levels 
should  be  reasonable  for  each  sample. 

In  order  to  modify  the  ARC  parameters  to  make  tree  coding  more  effective 
then,  it  would  appear  that  it  would  be  desirable  to  select  parameters  which 
would  decrease  granular  noise  and  let  the  tree  coding  reduce  slope  overload 
noise.  Specifically,  it  would  seem  to  be  desirable  to  "move  in"  the  output 
and  scaling  factors  of  the  quantizer,  decrease  the  updating  gains,  and 
reduce  the  "time  constants"  of  the  system.  In  this  way,  there  would  be  a 


richer  selection  of  quantizer  levels  for  the  tree  code. 

There  are  also  several  other  areas  to  be  researched.  In  our 
simulations,  the  tree  coding  took  place  entirely  within  the  pitch  loop. 

It  might  be  interesting,  therefore,  to  investigate  the  effect  of  basing 
the  pruning  criterion  on  the  reconstructed  speech  rather  than  the  re¬ 
constructed  depitched  speech.  Doing  this  might  aid  in  the  smooth  transition 
from  one  pitch  block  to  the  next.  Another  idea  which  could  be  investigated 
would  be  delayed  updating  of  the  ARC  algorithm.  In  delayed  updating,  the 
algorithm  would  be  updated  on  the  basis  of  the  quantizer  level  just  de¬ 
cided  upon  (corresponding  to  the  sample  L  samples  earlier),  so  that  the 
updating  could  be  done  once  for  all  nodes,  eliminating  many  computations. 

Tree  coding  might  also  be  made  more  effective  by  making  changes  in 
the  tree  coding  algorithm  itself.  One  possible  path  for  investigation 
would  be  in  the  area  of  variable  symbol  release,  as  developed  by  Goris. 
Another  possible  path  would  involve  investigating  different  forms  of  the 
pruning  criterion.  For  example,  it  might  improve  performance  to  weigh 
the  contribution  from  earlier  samples  more  heavily  than  that  from  recent 
samples,  because  it  would  appear  that  the  "soft"  decisions  become  "harder" 
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CHAPTER  c 

BACKWARD  PITCH  EXTRACTION  ADAPTIVE  RESIDUAL 
CODER  (BP ARC) 

6.1  Introduction 

Many  of  the  practical  systems  for  digitizing  speech  are  variants 
of  differential  pulse  code  modulation  (DPCM) .  The  speech  coder 
developed  for  the  9.6  Kbs  bit  rate  uses  this  structure  augmented  by 
a  pitch  extraction  loop.  This  algorithm  is  called  Pitch  extraction 
Adaptive  Residual  Coder  (PARC) . 

The  system  described  in  the  chapter  is  identical  to  PARC  except  for 
the  method  of  pitch  extraction.  In  PARC,  pitch  is  extracted  block  by 
block  from  raw  speech.  Once  the  correlation  coefficient  6  and  pitch 
period  T  are  known,  reduced  speech  is  formed,  processed  by  the  coder 
and  transmitted.  The  B's  and  T's  also  need  to  be  transmitted;  the  number 
of  bits  required  to  transmit  these  parameters  depends  on  the  number  of 
parameters  available  (this  depends  on  block  length),  type  of  coding 
employed  and  the  way  they  are  transmitted.  In  addition  to  the  number 
of  bits  required  for  transmission,  the  transmission  of  these  parameters 
necessitates  a  framing  of  the  bit  stream  and  an  associated  frame  synchroni¬ 
zation  problem. 

In  order  to  avoid  this  framing  problem,  a  backward  adaptive  approach 
was  investigated  which  would  not  require  transmission  of  B's  and  T's. 

Since  the  values  of  B  and  T  change  very  little  in  a  given  voiced  region, 
these  parameters  can  be  calculated  for  a  previous  block  of  speech  and 
used  for  the  current  block  to  reconstruct  speech.  As  not  .d  in  an  earlier 
chapter,  short  pitch  blocks  provide  the  best  performance.  However,  these 
short  blocks  require  the  transmission  of  a  large  amount  of  side  information, 
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6  and  T's,  making  them  impractical.  The  use  of  the  backward  identification 
of  6  and  T  eliminates  this  problem. 

In  following  sections,  this  approach  referred  to  as  Backward  PARC 
(BPARC)  is  described  along  with  various  computer  simulation  studies 
comparing  it  with  the  standard  PARC  algorithm.  A  complete  listing  of  the 
source  program  and  a  flow  chart  for  the  simulation  are  also  included  in 
this  chapter. 


6.2  Svstem  Structure 


Figure  6.1  shows  the  block  diagram  of  the  backward  adaptive  pitch 

extraction  residual  coder.  A  comparison  of  this  figure  with  Fig.  2.1 

reveals  the  obvious  difference  that  6  and  T  are  now  computed  from 

s(k)'s  rather  than  s(k)'s.  As  a  result  of  this  change,  it  is  no  longer 

necessary  to  transmit  ?  and  T  since,  in  case  of  no  transmission  errors, 

the  receiver  can  carry  out  the  same  computation.  Note  that  t lie*  receiver 

must  now  be  more  complex  since  it  must  be  capable  of  computing  8  and  T. 

The  use  of  s(k)  to  compute  6  and  T  causes  another  less  obvious 

change  in  the  transmitter.  In  order  to  use  s(k)'s,  the  computation  of  - 

and  T  must  be  based  only  on  past  speech  estimates.  In  the  forward 

adaptive  case,  the  computation  of  6  and  T  is  based  on  both  future  and 

past  speech  samples.  The  basic  approach  for  BPARC  is  to  compute  B  and  T 

on  a  block  of  s(k),  k  =  k  ,  k  -l,...,k  -  K  and  then  to  use  this  value 

o  o  o  c 

of  B  and  T  to  form  reduced  speech  v(k)  for  k  =  k  +1,  k  +2,...,  k  +K  . 

o  o  o  u 

Here  K  is  called  the  computation  block  size  and  K  is  the  use  block  size; 
c  u 

these  two  values  are  not  necessarily  equal. 

The  basic  philosophy  of  the  BPARC  approach  is  that  in  voiced  segments, 
where  pitch  redundancy  removal  is  most  effective,  B  and  T  does  not  change 
rapidly.  Table  6.1  illustrates  this  fact  with  a  listing  of  8  and  T  for 
a  typical  segment  of  voiced  speech.  The  use  of  a  6  and  T,  computed  for  one 
block,  in  the  next  block  will  not  have  much  effect  on  the  performance  in 
such  a  voiced  segment. 

There  will,  however,  be  rapid  changes  in  6  and  T  during  transitions 
from  voiced  to  unvoiced  or  unvoiced  to  voiced  speech.  During  these 
transitions,  significant  performance  degradation  can  be  expected  in  BPARC 


Table  6.1 


Values  of  6  and  T  for  a  Segment  of  Voiced  Speech 


Sample 

Number 

6 

T 

5400 

0.91 

30 

5520 

0.80 

30 

5550 

0.06 

30 

5500 

0.92 

30 

5510 

0.99 

30 

5640 

0.95 

30 

5670 

1  .04 

30 

5700 

0.98 

30 

5730 

0.98 

30 

5760 

1  .02 

30 

5790 

1.03 

30 

5020 

1.00 

30 

5050 

1.02 

30 

5800 

1  .00 

30 

5910 

1.06 

30 

5940 

1.06 

30 

5970 

1 .04 

30 

6000 

1  .09 

30 

6030 

1.08 

30 

6060 

1 .09 

30 

6090 

1.05 

30 

6120 

1.06 

30 

6150 

1  .06 

30 

6100 

1  .  10 

30 

6210 

1 .06 

30 

6240 

1.06 

30 

6270 

1 .03 

30 

6300 

1 .02 

30 

6330 

1  .02 

30 

6360 

1.00 

30 

6390 

1.00 

30 

6420 

0.98 

30 

6450 

0.99 

30 

6480 

0.97 

30 

6510 

0.95 

30 

6540 

0.93 

30 

6570 

0.94 

30 

6600 

0.94 

30 

6630 

0.87 

30 

6660 

0.89 

30 

6690 

0.05 

30 

6720 

0.04 

.  30 

i 


Fig.  6.2  Backward  Pitch  extraction  Adaptive 
Residual  Coder  (3PARC) 


over  the  PARC  algorithm.  The  hope  was  that  the  decrease  in  quantization 
noise  caused  by  removing  the  6  and  T  transmission  would  offset  this 
degradation.  The  advantage  of  not  requiring  framing  for  6  and  T  is  also 
obvious . 

Exactly  the  same  method,  namely,  Eqs.  (2.6)  and  (2.7)  are  used  to 
compute  6  and  T  for  the  BPARC  as  for  the  standard  PARC  except  s  is  used 
in  place  of  s  and  all  s's  are  past  values.  Hence  T  at  stage  k  is  the 
value  of  t  which  minimizes  the  AMDF  function  given  by 


A(t) 


1 

K 


k 

I  |s  J)-S(j-T) | 

j=k-K 

J  c 


t  =  20,21 , . . . t 


(6.1) 


Once  T  is  known,  6  is  determined  from 


B  = 


k 

l  s(j)s(j~T) 


j=k-K 


c 


k 

l  s(j-T)s(j-T) 

j=k-K 

c 


(6.2) 


A  block  diagram  for  the  complete  BPARC  algorithm  is  given  in  Fig.  6.2. 


This  algorithm  assumes  that  a  complete  sentence  is  read  in  and  then 
processed.  This  algorithm  was  programmed  on  the  PDP-11/60  in  order  to 
determine  its  performance  characteristics. 


6.3  Performance  Evaluation  and  Parametric  Studies 


As  the  first  step  in  evaluating  the  performance  of  the  BPARC 
algorithm,  a  comparison  of  8,  T  and  SER  using  the  normal  (forward)  PARC 
and  the  BPARC  algorithms  was  made.  Table  6.2  shows  a  typical  portion 
of  the  results  from  this  study.  Note  that  SER  is  significantly  degraded 
in  the  transition  regions  at  the  beginning  and  end.  There  is  some  de¬ 
crease  (~l  db)  in  SER  in  the  voiced  segment.  A  computation  block  length 
of  30  samples  was  used  for  both  cases.  This  degradation  in  SER  is 
understandable  since  we  use  8  and  T  which  were  calculated  from  previously 
processed  speech  are  used  for  the  present  block  of  samples.  In  transition 
regions  such  as  V/UV,  UV/S,  S/V,  the  previously  processed  speech  block  can 
be  much  different  than  the  current  block  of  speech  to  be  processed. 

A  study  of  the  effect  of  different  lengths  of  "computation  block"  and 
"use  block"  on  SNR  was  conducted  next.  In  general  for  very  large  "computa¬ 
tion"  or  "use"  block  SNR  goes  down  while  for  smaller  block  lengths  increase 
in  performance  was  noticed.  However,  there  is  not  a  monotonic  increase  or 
decrease  in  performance  noticed  with  a  decrease  or  increase  in  block  lengths. 
The  results  are  tabulated  in  Table  6.3. 

From  the  very  basic  idea  of  BPARC,  it  is  clear  that  this  approach 
should  work  better  for  voiced  speech.  Exactly  the  same  thing  was  observed 
when  the  range  of  possible  B's  was  limited  to  correspond  to  voiced  speech. 
Table  6. A  shows  the  improvement  in  SNR  by  restricting  B. 

Table  6. A 

Effect  of  Limit  Range  for  B 
(Sent  11,  Female  Speaker,  Ku  =  =  30) 


-2  <  B  <  2 

0.7A  <  B  <  1.14 

SNR 

19.96  db 

20.69  db 

SNR(inloop) 

15.00  db 

14.28  db 

SER 

A. 96  db 

6.41  db 

H 

1.51  b/sample 

1.46  b/sample 
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In  the  above  approach,  6  was  set  zero  if  it  was  not  in  the  allowed 
range.  It  appeared  that  the  resulting  sharp  changes  in  8  would  degrade 
performance.  To  eliminate  this  problem  a  limit  was  set  on  the  maximum 
percentage  changes  in  6  so  that  6  would  vary  more  smoothly  in  transition 
regions.  This  approach  helped  to  increase  SER  as  shown  in  Table  6.5  but 
not  the  performance  as  SNR  inloop  decreased. 


Table  6.5 


Effect  of  Limiting  A 8 

(Sent  1,  male  speaker,  =  45,  =  30),  0.65  £  8  <  1-25) 


A8  =  0 

| AS  <  10% 

I  Ag 

[  <  20% 

SNR 

SNR(inloop) 

SER 

H 

17.78  db 

14.84  db 

2.94  db 

1.49  b/sample 

17.57  db 

14.45  db 

3.12  db 

1.50  b/sample 

17.70 

14.67 

3.02 

1.49 

db 

db 

db 

b/sample 

To  calculate  the  pitch  parameter  T  the  function 


K 

A(T)  =  l  |s(j)  -  s(j-T)  |  was 

j-1 

minimized  with  respect  to  T  for  20  5.  T  L  where  L  >_  computation 
block  length  =  30.  Different  values  of  L  were  tried.  Some  improvement 
in  performance  was  noticed  (see  Table  6.6)  if  search  range  was  decreased 
up  to  a  certain  point.  For  a  long  search  range,  the  effect  of  errors 
that  were  made  while  reconstructing  previous  samples  becomes  more  severe 
while  a  small  search  range  may  not  be  enough  to  detect  correct  period  T. 


Table  6.6 


Effect  of  Search  Range  for  T 
(Sent  1  :  male  speaker,  =  30) 


20  <  T  <  100 

20  <_  T  ^  80 

20  <  T  <_  70 

SNR 

SNR (inloop) 

SER 

H 

17.67  db 

14.87  db 

2.80  db 

1.49  b/sample 

17.78  db 

14.84  db 

2.94  db 

1.49  b/ sample 

17.78  db 

14.75  db 

3.03  db 

1.49  b/sample 

8Q 


As  discussed  eqrlier  for  the  normal  PARC  algorithm  it  is  necessary 
to  transmit  S's  and  T's  along  with  quantized  residual  reduced  speech. 

This,  of  course,  requires  a  few  bits  per  sample.  If  a  block  size  of  100 
is  used  and  8  and  T  require  13  bits,  this  becomes  0.13  bits/sample.  That 
leaves  1.37  bits/sample  for  transmission  of  other  information.  Hence, 
to  make  a  fair  comparison  between  the  two  algorithms,  H  was  limited  to 
1.37  for  PARC  and  1.5  bits/sample  for  BPARC.  Even  though  BPARC  is  not  a 
clear  winner  (see  Table  6.7),  it  is  a  very  attractive  solution  to  problems 
of  transmitting  8  and  T. 


Table  6.7 

Comparison  of  PARC  and  BPARC  Algorithms 

SNR  SNR  SER 

inloop 


Sentence  1 


PARC 

BPARC 


19.11  db 
17.78  db 


13.63  db 
14.84  db 


5.48  db 
2.94  db 


1.37  b/sample 
1.49  b/sample 


PARC 


Sentence  11 


BPARC 


21.40  db 


12.95  db 


8.45  db 


1.36  b/sample 


20.53  db 


14.21  db 


6.31  db 


1.47  b/sample 
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6.4  Transmission  Error  Studies 

No  speech  encoding  algorithm  is  good  unless  it  can  tolerate  with  channel 
errors.  To  study  the  effects  of  channel  errors,  first  step  is  to  determine 
the  effect  of  errors  in  quantizer  levels.  The  errors  that  are  introduced 
in  quantizer  levels  do  not  exactly  correspond  to  errors  caused  due  to  bit 
reversal  but  nevertheless  they  are  a  good  measure  of  the  algorithm's  suscepti¬ 
bility  to  channel  errors. 

The  following  procedure  was  adopted  to  study  effect  of  channel  errors 
in  the  BPARC  algorithm: 

(i)  The  receiver  program  was  extracted  from  transmitter  program. 

(ii)  A  quantizer  output  file  was  created  with  desired  transmission 
errors. 

(iii)  This  file  was  read  by  receiver  and  a  reconstructed  speech 
s(k)  file  was  created. 

(iv)  SNR  was  calculated  between  original  speech  and  received  speech. 

The  algorithm  tolerated  one  transmission  error  (for  male  speaker  SNR 
goes  down  from  19.52  to  19.48)  but  becomes  unstable  with  additional  errors. 

It  was  suspected  that  wrong  s's  lead  to  wrong  8's  and  possibly  wrong  T's 
which  in  turn  make  s's  wrong.  To  investigate  the  cause,  the  BPARC  program 
was  run  with  no  pitch  removal.  The  algorithm  works  very  well  (see  Table  6.8) 
even  with  1%  transmission  error  rate  thus  confirming  above  doubt. 

Table  6.8 

Effect  of  Transmission  Errors  on  BPARC  with  no  Pitch  Extractor 
(Sentence  1:  Male  speaker, 

SNR  with  no  transmission  error  =  18.83  db,  (H=2.18) 


Transmission 

error  rate 

SNR 

0.01% 

18.82  db 

0.1% 

18.37  db 

1% 

4.25  db 

A  fixed  predictor  instead  of  adaptive  predictor  was  also  tried  in  the 
system.  It  was  found  that  fixed  predictor  minimizes  the  effect  of  trans¬ 
mission  errors  to  some  extent. 

Table  6.9  lists  6's  and  T's  with  and  without  transmission  error. 

Note  that  6  changes  immediately  after  error  has  been  introduced.  The  plot 
in  Figs.  6.3  and  6.4  compare  the  effect  of  transmission  error  on  local  SNR 
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Table  6.9 

2 tract  of  Transmission  Error  (Sentence  11:  Female  speaker) 

Virhout  transmission  error  'vitli  transmission  error  at 

sample  number  1674,  7075 


Sample 

Number 


1290 

0.88 

27 

1320 

0.94 

27 

1350 

0.97 

55 

1330 

0.98 

27 

1410 

0.98 

27 

1440 

0.95 

27 

1470 

0.99 

27 

1500 

0.99 

27 

1530 

0.92 

27 

1560 

0.94 

27 

1590 

0.94 

27 

1620 

0.97 

55 

1650 

0.98 

55 

1680 

1  .01 

55 

1710 

1  .00 

55 

1740 

1.02 

55 

1770 

1.02 

55 

1800 

1  .00 

55 

1830 

0.93 

55 

1860 

0.89 

55 

1890 

0.81 

27 

1920 

0.83 

27 

1950 

0.87 

27 

1980 

0.91 

27 

2010 

0.91 

27 

2040 

0.90 

27 

2070 

0.90 

28 

2100 

0.87 

28 

2130 

0.00 

28 

2160 

0.70 

57 

2190 

0.70 

28 

2220 

0.00 

27 

2250 

0.00 

24 

2280 

0.00 

25 

2310 

0.00 

20 

23 1*0 

0.00 

20 

2370 

0.00 

20 

2400 

0.00 

20 

2430 

0.00 

20 

2460 

0.00 

20 

2490 

0.00 

20 

2520 

0.00 

20 

2550 

0.00 

20 

2580 

0.00 

20 

2610 

0.00 

20 

2640 

0.00 

20 

2670 

0.00 

20 

2700 

0.00 

20 

2730 

0.00 

20 

6  T 


0.88 

27 

0.94 

27 

0.97 

55 

0.98 

27 

0.98 

27 

0.95 

27 

0.99 

27 

0.99 

27 

0.92 

27 

0.94 

27 

0.94 

27 

0.97 

55 

0.98 

55 

1.01 

55 

0.99-*— 

55 

1.02 

55 

1 .02 

55 

0.99 

55 

0.93 

55 

0.88 

55 

0.80 

27 

0.82 

27 

0.86 

27 

0.91 

27 

0.91 

27 

0.91 

27 

0.91 

28 

0.89 

28 

0.65 

28 

1 .02 

57 

1  .01 

57 

0.85 

57 

0.84 

57 

0.84 

57 

0.84 

57 

0.84 

57 

0.84 

57 

0.84 

57 

0.84 

57 

0.84 

57 

0.84 

57 

0.84 

57 

0.84 

57 

0.84 

57 

0.84 

57 

0.84 

57 

0.84 

57 

0.84 

57 

_i*ii 


4 

T  / 


TT 


6.5  Conclusions 


The  study  presented  in  this  chapter  of  the  BPARC  algorithm  shows 
that  it  is  an  attractive  approach  for  a  fairly  error-free  channel  but 
not  suitable  for  channel  with  at  least  0.1%  error  rate.  The  following 
changes  in  algorithm  might  help  to  make  it  more  robust. 

1.  Currents  8  and  T  are  calculated  for  computation  block  (using  received 
speech)  and  then  used  for  the  current  block  to  construct  speech.  It 
is  now  possible  to  use  the  new  received  speech  samples  for  current 
block  and  re-calculate  8  and  T  using  those  samples.  These  values  of 
8  and  T  should  be  closer  to  true  values  for  that  block. 

2.  Maximum  changes  in  8  could  be  limited  thus  reducing  effect  of  channel 
error. 

3.  Errors  in  pitch  periods  (in  voiced  region)  due  to  channel  errors 
could  be  known  by  looking  at  T's  of  previous  blocks  and  then  could 
be  corrected. 

Unfortunately  time  did  not  permit  an  examination  of  these  ideas. 
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CHAPTER  7 
TANDEM  OPERATION 

7.1  Introduction 

One  of  the  requirements  of  the  speech  coding  algorithm  is  that  it 
should  perform  satisfactorily  in  tandem  with  a  CVSD  speech  coder 
operating  at  a  data  rate  of  16  Kb/s  and  this  tandem  configuration  should 
provide  speech  intelligibility  with  minimal  degradation  compared  with  a 
single  link  of  CVSD  operating  at  16  Kb/s.  The  simulation  of  this  tandem 
operation  was  done  and  results  are  discussed  in  Sec.  7.2. 

Another  requirement  of  speech  coding  algorithm  is  that  it  should 
produce  intelligible  speech  under  acoustic  background  noise.  The 
simulation  of  background  noise  and  the  performance  of  PARC  are  discussed 
in  Sec.  7.3. 
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7 . 2  PARC  in  Tandem  with  CVSD 

The  performance  of  the  PARC  system  in  tandem  with  CVSD  system  was 
studied.  This  study  included  passing  raw  speech  through  CVSD  algorithm 
to  create  a  file  of  CVSD  output  speech  and  using  this  speech  file  as 
input  to  the  PARC  algorithm.  The  output  of  the  PARC  algorithm  becomes 
the  output  of  the  tandem  system  CVSD-PARC  as  shown  in  Fig.  7.1a. 

Similarly  the  PARC-CVSD  tandem  connection  shown  in  Fig.  7.1b  was  also 
studied.  The  CVSD  algorithm  used  for  the  study  of  tandem  operation  is 
described  in  Appendix  D. 

The  CVSD  algorithm  used  in  this  study  operates  on  input  speech 
sampled  at  the  rate  of  16k  samples  per  second.  This  means  the  sampling 
rates  at  the  input  and  the  output  of  CVSD  must  be  modified  in  order  to 
make  the  tandem  connections  with  PARC  which  operates  on  speech  sampled  at 
6.4  KHz.  The  resampling  can  be  done  by  using  resampling  programs  listed 
in  Appendix  D.  The  CVSD  program  has  incorporated  this  resampling  program 
which  makes  simulation  less  time  consuming. 

The  performance  of  these  tandem  connections  are  judged  by  Signal  to 
quantization  Noise  Ratio  and  the  subjective  criterion.  Sentence  1,  "Cats 
and  Dogs  each  hate  the  other",  spoken  by  a  male  speaker,  was  used  for 
simulation.  Results  are  reported  in  Table  7.1  and  7.2. 


TABLE  7.1  SNR  for  PARC, CVSD  and  their 
interconnections. 


PARC 

CVSD 

PARC-CVSD 

CVSD-PARC 

tandem 

tandem 

18.31  db 

11.78  db 

10.80  db 

10.68  db 

Fig.  7.1a  CVSD  in  Tandem  with  PARC 


Fig.  7.1b  PARC  in  Tandem  with  CVSD 


TABLE  7.2  Effect  of  CVSD  Speech  input 
on  PARC  performance 


SNR 

Inloop 

SNR 

SER 

Entropy 

H 

PARC  with 
speech  as 

raw 

input 

18.31  db 

13.69  db 

4.62  db 

1.41  bits/samp lo 

PARC  with 
speech  as 

CVSD 

input 

16.84  db 

12.90  db 

3.94  db 

1.74  bits/ sample 

The  results  in  Table  7.1  show  that  there  is  little  degradation  in  SNR  due  to 
tandem  operations.  In  fact,  the  speech  quality  of  CVSD-PARC  tandem  seems 
to  be  better  than  CVSD  alone  in  terms  of  perception.  This  study  indicated 
that  PARC  acts  as  a  filter  for  the  granular  noise  in  the  CVSD  output.  The 
performance  of  these  tandem  operations  could  be  improved  by  redesigning 
various  parameters  used  in  PARC.  However,  at  this  point  the  purpose  of  the 
study  was  to  make  sure  that  PARC  algorithm  performs  reasonably  well  in 
tandem  with  CVSD. 

It  can  be  seen  from  Table  7.2  that  the  SNR  decreased  by  less  than  2  db 
by  inputting  the  CVSD  speech  instead  of  raw  speech.  This  decrease  in  SNR 
could  be  attributed  to  the  decrease  in  the  predictor  performance  as  a 
result  of  high  frequency  contents  of  CVSD  speech  and  poor  signal  energy 
reduction  due  to  decrease  in  correlation  between  CVSD  speech  samples. 
However,  this  decrease  in  SNR  is  surprisingly  low.  This  shows  that  the 
PARC  algorithm  works  very  well  for  noisy  input  speech  except  for  some 
increase  in  entropy.  The  entropy  increase  is  a  rather  serious  problem  but 
solvable  by  using  buffer  control  techniques  which  are  discussed  in  details 


in  Chapter  10. 
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7 . 3  Effect  of  Background  Speakers  on  PARC  Performance 

It  has  been  observed  that  PARC  algorithm  performs  satisfactorily  ve i ! 
in  tandem  with  CVSD.  However,  the  noise  in  CVSD  speech  is  white.  It  is 
also  important  to  study  the  effects  of  correlated  background  noise  such 
as  office  no  ..s^ 

One  of  the  speech  coding  algorithm  requirements  is  that  the  speech 

coder  shall  produce  intelligible  speech  under  conditions  of  acoustic 

2 

background  noise,  60  db  referenced  to  20  y  Newtons/meter  .  This  statement, 
though  technically  precise,  gives  little  feeling  about  the  loudness  of 
noise.  Figure  7.2  [1]  gives  comparative  intensities  of  variety  of  common 
sounds.  The  noise  level  described  above  is  similar  to  quiet  office  noise. 
Regarding  sound  energy,  Alex  ander  Woods' [2]  quotation  gives  the  whole 
picture.  He,  in  his  book,  "Physics  of  Music",  points  out  that  sound  energy 
generated  by  shouting  of  the  crowd  throughout  an  exciting  game  (say, 

50,000  people  at  a  90  minute  football  match  between  Notre  Dame  and  USC 
is  just  about  enough  to  warm  one  cup  of  coffee. 

The  effect  of  background  noise,  consisting  of  typewriter  noise, 
conversation,  music  and  so  forth  was  studied  on  the  real-time  system.  The 
algorithm  performs  with  no  difficulty. 

The  study  of  background  noise  is  very  simple  after  the  algorithm  has 
been  implemented  in  real  time.  It  is  just  a  matter  of  talking  into 
handset  with  noise  in  the  background.  The  output  could  be  heard  through 
headphone.  However,  in  the  FORTRAN  simulation,  the  task  is  not  so  straight¬ 
forward.  There  is  a  need  for  digital  speech  file  with  background  noise. 

It  was  thought  that  periodic  background  noise  would  be  the  worst  kind  of 
noise  for  PARC  algorithm.  Therefore,  it  was  decided  to  study  the  perform¬ 
ance  of  the  algorithm  for  multispeaker  files.  Multispeaker  files  were 
created  by  adding  two  digital  speech  files  with  appropriate  weight. 
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In  the  simulation,  the  multispeaker  file  was  generated  by  adding 
sentence  11  (female  speaker)  to  male  speaker,  sentence  7,  as  shown  in 
Eq.  (7.1). 

s  =S  +kS  (7.1) 

composite  11  1 

where  k  takes  values  from  0  to  1  thus  having  varying  degree  of  background 
noise.  It  was  noticed  that  pitch  extraction  loop  picks  pitch  for  both 
the  speakers  and  algorithm  performs  very  well  as  can  be  seen  from  the 
results  in  Table  7.3. 


TABLE  7.3  SNR's  for  Multispeaker  Files 


Multispeaker 

file 

SNR 

Inloop 

SNR 

SER 

Entropy 

H  Bits/sample 

S11  +  °  S1 

21.11  db 

13.01  db 

8.10 

db 

1.39 

S11  +  -25  S1 

20.16  db 

13.03  db 

7.13 

db 

1.57 

S  +  .5  Sx 

19.44  db 

13.39  db 

6.05 

db 

1.66 

•a  * 1  h 

18.92  db 

13.95  db 

4.97 

db 

1.76 
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Fig  7.2  Comparative  intensities  of  a  variety  of  common  sounds 

from  bottom  to  top  in  order  of  increasing  sound  pressure, 
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CHAPTER  3 

TRANSMISSION  ERRORS 

8 . 1  Introduction 

Many  toll-quality  speech  links  maintain  bit-error  rates  (BER)  which 
are  too  small  (less  than  10  to  affect  the  quantizer  and  hence  the 
coder  performance.  However,  a  BER  of  one  tenth  of  a  percent  is  not 
uncommon  and  for  bad  channels  this  rate  could  be  as  high  as  one  percent. 
In  such  cases.  SNR  degradation  may  be  severe  unless  special  precautions 
are  taken.  It  is  important  to  determine  the  extent  of  SNR  degradation 
and  if  possible  how  to  minimize  it. 

The  study  outlined  in  this  chapter  is  an  attempt  to  answer  the 
question  posed  above.  Section  8.2  describes  the  method  of  introducing 
random  transmission  errors.  This  method  was  simulated  on  digital 
computer  such  that  BER  could  be  changed  at  run  time.  With  the  intro¬ 
duction  of  transmission  errors,  the  effect  of  various  parameters  such  as 
predictor  order,  coarseness  of  the  quantizer  and  various  decay  constants 
on  SNR  was  observed.  Simulation  results  are  discussed  in  Section  8.3. 
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8.2  Simulation  of  Transmission  Errors: 


It  is  assumed  that  transmission  errors  that  occur  in  the  digital 
channel  as  shown  in  Fig.  8.1  are  random  in  nature.  Hence  in  the 
simulation  it  is  important  to  insure  that  errors  do  not  occur  in  bursts. 
Similarly  it  is  assumed  that  errors  are  made  by  bit  reversal  and  not  bv 
bit  addition  or  deletion.  From  the  Fig.  8.1,  it  may  appear  that  the 
effect  of  errors  introduced  in  quantizer  levels  is  similar  to  errors 
introduced  in  bit  stream  representing  quantizer  levels  provided  encoding 
and  decoding  operations  are  carried  out  correctly.  However,  it  must  be 


kept  in  mind  that  samples  are  not  gained  or  lost  if  the  errors  are  intro¬ 
duced  in  quantizer  levels  while  they  may  be  if  errors  are  introduced  in 
bit  streams.  The  advantage  of  introducing  errors  in  the  quantizer  levels 
is  that  the  effect  of  transmission  errors  can  be  measured  in  terms  of 


degradation  of  SNR  so  that  the  effect  of  various  parameters  on  transmission 
errors  can  be  easily  evaluated.  With  errors  in  the  bit  stream,  SNR  looses 
its  meaning  since  samples  may  no  longer  be  synchronized.  Of  course,  the 
transmission  errors  do  affect  the  bit  stream.  The  following  procedure 
was  adopted  for  simulation  of  transmission  error. 

1.  Separate  programs  for  PARC  transmitter  and  receiver 
were  written.  Program  asks  for  the  bit  error  rate  and 
transmitter  produces  a  file  of  quantizer  levels  repre¬ 
sented  by  integer  numbers  from  1  to  11.  Receiver 
program  reads  quantizer  levels  from  this  file  and 
produces  file  of  reconstructed  speech  samples. 

2.  Randomized  transmission  errors  were  introduced  by  using 
RANDU  function  available  in  PDP  11/60  library.  Care 
should  be  taken  to  make  the  seeds  large  enough  for  this 
function  so  that  bursts  of  errors  do  not  occur  in  the 
beginning. 


„ 


QUANTIZER 
LEVELS  q 


DIGITAL 

CHANNEL 


PARC 

RECEIVER 


Fig.  8.1  Speech  Coder 


3.  Original  speech  file  and  reconstructed  speech  file  is 
compared  and  SNR  is  calculated.  This  procedure  is 
repeated  for  different  values  of  parameters. 
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8 . 3  Minimizing  Transmission  Error  Effect 

Because  of  error  in  quantizer  level  q(k)  reconstructed  reduced 

speech  v(k)  and  hence  reconstructed  speech  sample  s(k)  is  erroneous. 

This  makes  e(k)  also  erroneous  which  in  turn  causes  a.'s  (predictor 

parameter)  to  be  incorrect.  The  effect  of  e(k)  on  a  ' s  can  be 

i 

observed  from  equation  2.  which  is  repeated  here  for  convenience. 

g  v(k-i)  e (k) 

a^(k+l)  =  a.(k)  +  — - - - 

[ (1-a)  l  aJ  v(k-j)  +  RMSMIN] 
j=o 

Effect  of  erroneous  e(k)  on  updating  a^'s  can  be  minimized  by  increasing 
RMSMIN  and  decreasing  g.  This  can  be  seen  from  Table  8.1  and  8.2. 

Table  8.1 

Male  speaker,  sentence  1: 


Bit  Error  Rate  3 

.  in  100 

SNR 

SNR 

8 

without  error 

with  error 

0.015 

19.25  db 

5.83  db 

0.01 

19.07  db 

6.49  db 

Table  8.2 

Male  speaker,  sentence  1: 

Bit  Error  Rate  1  in  100 


RMSMIN 

SNR 

without  error 

SNR 

with  error 

70 

18.96  db 

7.18  db 

65 

19.05  db 

6.95  db 

55 

19.30  db 

8.04  db 

52 

19.29  db 

7.73  db 

30 

19.83  db 

3.66  db 

j 

■-  -wr.  , - '  -  — . . -rr 
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Predictor  output  is  linear  combination  of  past  v's  and  is  given  bv 

N 

p(k)  =  l  a. (k)  v(k-i) 
i=l 

where  N  is  the  order  of  predictor. 

For  high  predictor  order  effect  of  transmission  error  is  more  since 
comparatively  larger  number  of  incorrect  predictor  coefficients  and  larger 
number  of  previous  incorrect  samples  contribute  to  predictor  output.  Since 
a/s  and  v's  both  are  used  in  constructing  p(k),  effect  of  transmission 
errors  for  increase  in  predictor  order  is  rather  serious.  It  was  observed 
that  by  decreasing  predictor  order  from  8  to  4  SNR  (for  BER  of  1  in  100) 
improved  by  3  db. 

Various  decay  constants  such  as  a,  exponential  decay  for  RMS  value 
calculation  and  6,  decay  constant  for  updating  predictor  parameters  do 
have  effect  on  performance  of  system  with  transmission  errors.  Choice  of  a 
controls  the  effective  interval  that  contributes  to  the  Rms  estimate.  This 
interval  is  larger  for  syllabic  system  while  it  is  smaller  for  instantaneous 
system.  For  both  extremes,  such  as  large  a  (syllabic)  and  small  a  (instan¬ 
taneous),  SNR  decreased.  (Ref.  Table  8.3). 

Decay  constant  6  is  used  to  update  predictor  parameters  to  prevent  the 
transmission  errors  to  propagate.  Larger  values  of  6  improve  the  system 
performance  with  error,  as  can  be  seen  from  Table  8.3. 


Table  8.3 


Parameters 

- STIR - 

with  no  error 

SNR 

with  1%  error  rate 

a  =  0.97 

19.28  db 

6.13  db 

=  0.9 

18.96  db 

7.18  db 

-  0.8 

18.89  db 

5.01  db 

*d  -  0.02 

19.46  db 

overflow 

-  0.04 

19.21  db 

-0.78  db 

-  0.06 

19.18  db 

1.63  db 

*With  all  other  parameters  optimized  for  good  error  performance,  degradation 
is  not  so  severe  with  above  values  of  6. 
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Quantizer  noise  is  also  an  important  factor  in  affecting  the  performance 
of  the  system  with  transmission  errors.  Closer  the  quantizer  levels  less  is 
the  quantization  noise  and  hence  less  is  the  effect  of  transmission  errors. 
Table  8.4  shows  that  by  taking  output  levels  apart  SNR  has  decreased. 

Table  8.4 

Symmetric  quantizer. 

Bit  error  rate  1  in  100. 


SNR 

SNR 

Output  levels 

without  error 

with  error 

0  1.8  4.25  6.5  8  12 

20.13  db 

10.18  db 

0  1.9  4.5  7.5  10.  12 

19.42  db 

7.65  db 

To  see  the  effect  of  various  error  rates  and  the  effect  of  errors  in 
different  segments  of  speech,  random  errors  with  1%  and  0.1%  error  rates 
were  added  using  different  random  sequences.  The  study  has  shown  that  0.1% 
error  rate  causes  little  degradation  while  it  is  significant  for  1%  error 
rate.  However,  output  speech  was  found  to  be  intelligible  in  spite  of 
BER  of  1%.  It  was  also  noticed  that  if  the  error  occurs  in  silence  segment 
of  speech  its  effect  is  negligible.  Considering  the  fact  that  40  to  60% 
of  the  speech  is  silence,  effect  of  small  BER  is  not  significant  as  can  be 
seen  from  Table  8.5 


+  7 


n: 


Table  8.5 


Case  1:  SNR  at  transmitter  =  20.13  db 

Error  Error 


Rate  1% 

Rate  0.1% 

Random  Sequence  //I 

7.13  db 

15.14  (15  errors) 

Random  Sequence  #2 

3.57  db 

20.00  (7  errors) 

Random  Sequence  #3 

9.78  db 

14.34  (12  errors) 

Case  2:  SNR  at  transmitter 

=  18.63  db 

Error 

Error 

Rate  1% 

Rate  0.1% 

Random  Sequence  //I 

6.99  db 

16.41  db 

Random  Sequence  #2 

6.00  db 

18.48  db 

Random  Sequence  it 3 

8.53  db 

13.72  db 
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8 . 4  Conclusion 

PARC  algorithm  found  to  perform  well  in  presence  of  random  channel 
errors  as  high  as  one  percent  error  rate.  The  degree  of  adaptation  in 
the  algorithm  and  error  performance  seems  to  be  related.  In  general, 
higher  adaptation  of  parameters  and  more  complex  system  leads  to  poor 
performance  in  presence  of  channel  errors.  For  example,  reducing  the 
predictor  order  from  8  to  4  improved  the  error  performance  significantly. 

PARC  employs  DPCM  quantizer,  hence  it  is  more  tolerant  to  randomly 
occurring  bit  errors  from  perceptual  point  of  view  than  systems  which 
employ  PCM  quantizers  (1) ,  (2) .  This  is  because  error  spikes  caused  in 
the  reconstruction  of  a  PCM  waveform  (due  to  wrongly  received  bit)  can 
have  maximum  amplitudes  which  are  in  the  order  of  peak  of  input  signal 
while  corresponding  spike  magnitudes  in  DPCM  decoding  really  related  to 
the  peak  value  of  first  difference  in  the  input.  The  corsequent  greater 
magnitude  of  a  typical  PCM  error  spike  makes  it  more  annoying  in  spite 
of  the  fact  that  it  does  not  propagate  in  time.  The  effect  of  channel 
error  on  the  synchronization  has  not  been  investigated. 
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CHAPTER  9 
FILTERING 

9. 1  Introduction 

Subjective  listening  tests  have  indicated  that  the  most  objectionable 
aspect  of  the  speech  generated  at  the  PARC  receiver  is  its  granular  noise 
due  to  quantization  errors.  Since  the  spectrum  of  the  quantization  noise 
nq(k)  will,  in  general,  be  whiter  than  the  speech  signal  s(k),  it  appears 
reasonable  to  develop  a  filter  for  removing  part  of  the  quantization  noise 
from  s(k)  and  hence  improving  the  speech. 

Section  9.3  describes  the  pre-emphasis  of  speech  to  improve  the 
speech  quality  in  terms  of  perception.  Various  choices  of  filters  are 
outlined  in  Sec.  9.3  and  the  results  are  presented  in  Sec.  9.4. 

In  the  study  of  pre-emphasis  of  speech  it  was  observed  that  low  pass 
filtering  does  have  considerable  effect  on  entropy.  This  effect  is  dis¬ 
cussed  in  Sec.  9.5  and  simulation  results  are  presented  in  Sec.  9.6. 

Design  of  low-pass  Butterworth  filter  is  outlined  in  the  Appendix. 


9 . 2  Evaluation  of  Pre-emphasis 


For  auto-correlated  signals,  such  as  speech,  predictive  coding  [1,2] 
is  an  efficient  method  of  encoding  the  signal  into  digital  form.  In 
predictive  coders,  the  quantization  noise  depends  on  prediction  errors; 
hence,  efficient  prediction  minimizes  quantization  error.  However,  in 
some  segments  of  speech,  such  as  unvoiced  speech,  the  degree  of  correla¬ 
tion  is  small  and  prediction  is  therefore  poor,  resulting  in  more 
quantization  noise.  When  the  amplitude  of  this  noise  is  comparable  to 
the  speech  signal  it  mars  the  quality  of  the  received  speech.  To  reduce 
this  problem,  it  appeared  that  some  sort  of  pre-emphasis  of  speech  would 
be  helpful. 

The  basic  concept  of  a  pre-emphasis  filter  as  shown  in  Fig.  9.1  is 
to  spread  energy  in  the  input  signal  over  the  full  bandwidth  of  the 
processor.  Since  most  of  the  energy  in  speech  is  in  the  lower  end  of 
the  spectrum,  the  filter  is  a  high-pass  filter;  and  hence,  the  de-emphasis 
filter  is  a  low  pass  filter.  As  pre-emphasis  filter  is  a  high  frequency 
filter,  the  unvoiced  segment  of  speech  gets  emphasized.  However,  the 
filter  design  is  such  that  overall  energy  gain  for  typical  phonetically 
balanced  sentence  is  approximately  to  unity. 

From  the  figure,  it  is  clear  that  pitch  extraction  is  to  be  carried 
out  on  high  pass  filtered  speech  to  get  6's  and  T's.  However,  it  was 
observed  that  it  makes  little  difference  if  the  6's  and  T's  are  obtained 
by  pitch  extraction  on  original  speech. 

It  might  be  possible  to  manipulate  the  block  diagram  of  PARC  by 
moving  filters  inside  to  get  the  equivalent  system.  This  is  of  no 
immediate  interest  and  hence  not  covered  here.  However,  PARC  algorithm 
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could  be  modified  to  Include  adaptive  noise  spectral  shaping  as  proposed 
by  Makhoul  &  Berouti  [3]. 


Pre¬ 

emphasis 

PARC 

filter 

algorithm 

De- 
^  emphasis 
filter 


Fig.  9.1  Pre  and  De-emphasis  Filters  with  PARC  Algorithm 
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9. 3  Filter  Selection 

As  mentioned  earlier,  the  pre-emphasis  filter  is  a  high-pass  filter 
and  de-emphasis  is  just  the  inverse  of  pre-emphasis  filter. 

In  order  to  minimize  complexity,  the  order  of  filter  is  kept  small. 
It  was  decided  to  use  a  3rd-order  filter  with  general  form 


sf(k)  =  |  s(k)  +  |  s(k-l)  +  |  s (k-2)  +  f  s (k-3) 


The  Z  transform  of  this  filter  is 


(9.1) 


H(z)  = 


-1  -2  -3 
1  +  aZ  +  bZ  +  cZ 


K 


(9.2) 


JwT 


Then,  using  the  fact  that  Z  is  just  eJ  where  T  is  the  sampling  interval 
the  system  function  becomes 


H(ju)  = 


i  ^  -juT  A  .  -j2wT  ,  .  -j 3uT 
1  +  ae  +  be  J  +Ce 


(9.3) 


and  the  magnitude  of  the  system  function  is 

|H(eja)t)|2  =  1  +  a2  +  b2  +  c2  +  [2a  +  2b  (a  +  c)  ]  cos  wT 
+  2(b  +  ac)  cos  2wT  +  2c  cos  3u>T 


(9.4) 


The  four  filters  whose  performance  will  be  described  in  details  in  the 
next  section  are  shown  in  Table  9.1. 


Table  9.1 
Filter  Parameters 


K 

a 

b 

c 

filter  1 

0.5113 

-1.182 

0.677 

-0.140 

filter  2 

0.5718 

-0.888 

0.486 

-0.095 

filter  3 

0.6718 

-0.626 

0.347 

-0.064 

filter  4 

0.7311 

-0.508 

0.219 

-0.053 

*  / 


TT 


I 


1 


I 

I 

I 


The  frequency  response  of  these  filters  is  shown  in  Fig.  9.2  for  a 
sampling  rate  of  6.4  kss. 

The  de-emphasis  filter  is  just  the  inverse  of  the  pre-emphasis 
filter.  Therefore,  for  filter  of  Eq.  (9.3)  the  inverse  is 

s(k)  =  K  sf(k)  -  as(k-l)  -  bs (k— 2 )  =  cs(k-3)  (9.5) 


1 

r 

i 


Frequency  (hertz) 

Fig.  9.2  Frequency  Response  of  Pre-emphasis  Filters 


9.4  Results 


The  four  pre-emphasis  filters  shown  in  Fig.  9.2  were  evaluated  on 
a  PARC  operating  in  the  9.6  kbs  mode.  Sentence  1,  "Cats  and  Dogs  each 
hate  the  other"  was  used  for  the  simulation.  Following  parameters  were 
computed  to  evaluate  the  performance  of  the  filters. 


SNR (PARC)  = 


l  sf2(k) 

l  (sf (k)  -  sf(k))2 


SNR (overall) = 


l  s2(k) 

l  (s(k)  -  s(k))2 


SEGSNR 


l  SNR. 
i=l 


n 


(9.6) 


(9.7) 


(9.8) 


where  n  -  number  of  blocks  of  block  length  120  samples  in  this  case. 


I  vf2oo 

SER  (Signal  Energy  Reduction)  =  -10  log  - “ —  (9.9) 

l  sf<k> 

where  v^(k)  =  s^(k)  -  8  s^(k-T) 

It  was  noticed  that  by  pre-emphasizing  speech,  the  output  speech  is 
perceptually  better  than  without  pre-emphasis.  However,  there  is  an 
increase  in  entropy  value.  This  is  due  to  the  fact  that  the  increased 
amplitude  of  high  frequency  speech  generates  more  upper  levels  of 
quantizer  thus  generating  more  bits.  All  results  are  reported  in 


Table  9.2. 
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Table  9.2 

Signal  to  Noise  Ratios 
with  Various  Pre-emphasis  Filters 

Product  of  Pre  SNR  Entropy 

&  De-emphasis  SNR  Whole  H 


Filter  i 

Sain  PARC 

SER 

System 

SEGSNR 

bits/sec 

no  filters 

- 

18.54 

db 

5.43 

db 

18.54 

db 

11.02 

db 

1.50 

filter  1 

1.003 

15.63 

db 

2.71 

db 

16.73 

db 

9.68 

db 

1.97 

filter  2 

0.994 

16.44 

db 

3.58 

db 

18.64 

db 

11.02 

db 

1.88 

filter  3 

0.999 

17.41 

db 

4.44 

db 

19.36 

db 

11.65 

db 

1.74 

filter  4 

0.997 

17.69 

db 

4.73 

db 

19.20 

db 

11.55 

db 

1.69 

9. 5  Low-pass  Filtering  vs.  Entropy 


For  the  9.6  Kbs  transmission  rate  and  6.4  KHz  sampling  frequency, 
the  number  of  bits  per  sample  is  1.5.  Transmission  of  parameters  such 
as  3  and  T  take  a  few  bits  per  sample.  Therefore,  the  entropy  in  the 
simulation  must  be  maintained  at  the  value  less  than  1.5.  In  the  previous 
section,  it  was  observed  that  pre-emphasis  makes  the  output  speech  per¬ 
ceptually  better;  however,  it  also  increases  entropy  which  is  unaccept¬ 
able.  To  overcome  this  problem  one  could  use  coarser  quantization  to 
make  the  entropy  small  to  begin  with  and  then  employ  pre-  and  de-emphasis 
filters.  Unfortunately,  the  improvement  in  speech  quality  due  to  pre¬ 
emphasis  operation  is  not  significant  enough  to  consider  the  above  approach 

Another  method  for  achieving  good  speech  quality  while  controlling 
the  bit  rate  would  be  to  select  parameters  such  that  the  speech  quality  is 
excellent  disregarding  the  increase  in  entropy  and  using  the  buffer  control 
to  check  the  bit  rate.  How  this  buffer  control  works  is  discussed  in 
details  in  the  next  chapter.  The  use  of  a  filtering  operation  to  control 
the  bit  rate  is  discussed  here. 

If  a  low  pass  filter  is  used  instead  of  high  pass  filter  as  a  pre¬ 
emphasis  filter,  energy  reduction  is  improved  and  as  a  result  entropy  drops 
Therefore,  the  buffer  will  fill  at  a  slower  rate  and  buffer  control  would 
be  used  infrequently;  consequently,  there  would  be  less  degradation  caused 
by  the  use  of  buffer  control.  However,  low  pass  filtering  with  bandwidth 
less  than  3200  Hz  causes  some  loss  of  speech  naturalness.  It  was  noticed 
that  during  high  energy,  voiced  segment,  of  speech  that  the  bit  rate  is 
higher  and  hence  the  buffer  fills  faster.  If  the  bit  rate  is  brought  down 
by  employing  low-pass  filtering  after  buffer  content  is  greater  than  a 
particular  threshold,  a  double  purpose  is  served.  One,  the  buffer  filling 
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operation  is  slowed  down  thus  avoiding  or  delaying  the  drastic  buffer 
control  operation.  Second,  high  frequency  componets  in  low  energy 
unvoiced  segments  of  speech  are  not  filtered  out  since  filter  is  in 
operation  only  after  particular  threshold  thus  preserving  natural  quality. 
This  threshold  was  found  by  plotting  the  buffer  content  against  time  for 
a  typical  sentence  and  noting  the  value  of  buffer  content  for  voiced 
speech.  For  the  bit  buffer  in  the  FORTRAN  simulation  of  the  PARC 
algorithm,  500  bits  appeared  to  be  reasonable  a  threshold  value. 

As  the  cut-off  frequency  of  low-pass  filter  is  decreased,  the  per¬ 
formance  of  the  predictor  improves  thus  decreasing  bit  rate  further. 

Thus,  the  filter  cut-off  frequency  can  be  decresed  depending  how  full 
the  buffer  gets.  Hence  the  pre-emphasis  filter  is  now  an  adaptive  low- 
pass  filter,  adaptation  of  the  cut-off  frequency  depending  on  the  buffer 
contents. 

The  low-pass  filter  used  is  a  simple  3rd  order  Butterworth  filter. 

Its  design  and  frequency  plots  are  given  in  an  appendix  to  still  chapter. 


9.6  Results  and  Conclusion 


The  FORTRAN  simulation  of  PARC  was  modified  to  include  the  adaptive 
low-pass  filter  concept.  To  insure  that  different  filters  are  not 
employed  for  every  sample  of  speech  when  the  buffer  contents  are  close 
to  the  threshold,  a  hysterisis  structure  was  utilized.  Once  a  particular 
filter  is  selected,  it  employed  in  the  algorithm  for  next  block  of  100 
samples.  The  buffer  content  is  compared  with  the  thresholds  only  after 
the  block  of  speech  samples  is  processed.  This  is  shown  in  the  flow  chart 
in  Fig.  9.3.  The  simulation  was  carried  out  for  Sentence  1:  male  speaker. 
The  effects  of  low-pass  filtering  on  entropy  and  signal  energy  reduction 
are  tabulated  in  Table  9.5  while  effects  on  buffer  content  and  bit  rates 
are  reported  in  Table  9.6. 


Table  9.5 

Effects  of  Low  pass  Filtering 


SNR 

SER 

Entropy  H 

no  filter 

14.85  db 

4.1  db 

1.37  bits/ sample 

LPF  with 

cut-off  1400  Hz 

16.43  db 

5.1  db 

1.95  bits/sample 

Table  9.6 

Effect  of  LPF  on  Bit  Rate 


Sample 

Number 

Change  in 
Buffer  Content 

Increase  Bit/ 
Sample 

#  Bit 
Rate 

no 

filter 

300  -  1000 

800 

1.14 

2.49 

LFP 

with  Cut- 

300  -  1000 

680 

0.97 

2.32 

off 

1400  Hz 

Bit  rate  is  obtained  by  adding  1.35  bits/sample  to  the  bit/sample 
increase  in  buffer.  This  is  because  on  an  average  of  1.35  bits/ 
samples  are  transmitted  on  digital  channel. 


Fig.  9.3  Hysterisis  Structure  of 
Adaptive  Low  Pass  Filter 


1  3 


The  results  in  Table  9.5  and  9.6  indicate  that  Low-pass  filtering 
can  be  used  to  control  bit  rate.  In  the  real-time  simulation  on  the 
MAP,  the  structure  of  programming  was  such  that  a  varying  number  of  speech 
samples  would  be  processed  each  time  to  get  exact  predetermined  number  of 
bits.  This  could  cause  double  or  triple  filtering  of  the  same  speech 
samples.  This  situation  occurs  in  voiced  segments  of  speech. 

The  all-pole  3rd  order  filter  could  not  be  used  since  double  and  triple 
filtering  causes  frequency  response  to  have  peaks  at  the  3  db  frequency. 

This  is  undesirable  and  hence  there  is  a  need  tc  find  new  methods  of 
employing  adaptive  low  filtering  for  buffer  control. 

It  was  observed  that  multiple  filtering  reduces  the  3  db  frequency.  As 
mentioned  earlier  more  bits  are  generated  in  voiced  speech  and  filter  cut¬ 
off  frequency  has  to  be  reduced  to  cut  down  the  bit  rate.  Since  multiple 
filter  case  happens  in  voiced  speech  and  since  it  is  the  region  where  low- 
pass  cut-off  frequency  needs  to  be  decreased,  the  design  of  single  filter 
would  be  enough.  Thus,  there  would  be  no  need  to  change  the  filter  as  buffer 
gets  closer  to  being  full.  This  happens  automatically  by  multiple  filtering 
when  buffer  gets  closer  to  being  full.  The  filter  which  gives  required 
change  in  3  db  frequency  upon  repetition  was  designed  and  design  and  frequenc 
response  is  outlined  in  the  appendix. 
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9 . A  Appendix:  Derivation  of  Filters 
9 . A . 1  Butterworth  (Maximally  flat)  Filters 

A  Butterworth  filter  is  designed  to  be  maximally  flat  at  the  origin 
of  the  magnitude  of  the  frequency  response,  i.e.,  the  filter  is  forced  to 
have  as  many  zero  derivatives  at  the  origin  of  the  magnitude  response  as 
possible. 

The  normalized  squared  magnitude  response  of  the  Butterworth  filters 
is 


H(u) 


1 


1  +  u 

P 


2n 


(9.8) 


where  to  =  co/oj  is  the  normalized  frequency  for  an  nth  order  and  j  is 
pc  c 

the  desired  3  db  cut-off  frequency  of  the  nth  order  filter. 

In  the  design  of  the  desired  filter  frequency  response  the  poles  of 
the  transfer  function  H(s)  are  needed,  then 

H(s)H(-s)  =  - 

1  +  <-s2)n 

1/(1  +  s2n) 

1/(1  -  s2n) 

Thus,  the  2n  roots  of  +  1  are  desired  depending  on  the  oddness  or  evenness 
of  the  order  of  the  desired  filter.  Consider  the  third-order  Butterworth 
filter;  for  n=3,  the  poles  are 


These  poles  are  plotted  in  Fig.  9.4. 


(9.9) 

n  even 
n  odd 
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Since  the  poles  of  the  magnitude  squared  transfer  function  are  symmetrically 
placed,  the  poles  which  fall  in  the  left  half  plane  are  assigned  to  H(s) 
for  physical  realizability. 

Thus 


1 

H(s)  =  - 

3 

n  (s  -  s  ) 
i=l 


(9.10) 


where  s^  are  the  left  half  plane  poles  of  the  magnitude  squared  transfer 
function.  In  the  third  order  case 


,  1  .*5  N 

S1  “  (  2  ~  3  2  )wc 


_  ,  1  .  .^3  . 

s2  -  (_2  +  3  2  )uc 


The  equivalent  transfer  function  for  the  digital  filter  becomes 

3 


H(z)  = 


Kz 


(z  -  px) (z  -  p2) (z-p3) 


(9.11) 


where 


S1T  1  /J 

P3  =  e  1  =  exp  [(— ^  ~  i  2 


s2^  1  /3 

P9  =  e  »  exp  [  (,-j  +  j  2  )“CT] 


s3T 


p,  =  e  “  exp  [  —ui  T ] 


H(z)  can  also  be  written  as 


H(z)  = 


-1  -2  -3 

1  +  az  +  bz  +  cz 


(9.12) 


Comparing  (9.11)  and  (9.12) 


fC  -1T  fC  JX  f 

a  =  -e  f9  [e  fs  +  2  cos(  -  TT  -g)  ] 


-ZTTt£_  -nfc  rx  , 

b  =  -e  fs  [1  +  2  e  fs  cos  (  — ] 


— 4tt  f  f 
c  =  -e  fs 

Where  fc  -  Cut  off  frequency  in  Hz. 


fg  -  Sampling  frequency  in  Hz. 


Frequency  response  of  various  Butterworth  filters  is  plotted  in  Fig.  9.5. 
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9. A. 2  Design  of  the  low-pass  filter  used  in  the  algorithm 

As  mentioned  in  the  earlier  sections  of  this  chapter,  low  pass 
filtering  was  seen  as  a  method  of  softly  degrading  the  speech  to  avoid 
overflowing  the  speech  buffer.  This  low  pass  filter  had  to  be  designed, 
however,  to  operate  in  the  MAP. 

The  first  design  of  the  necessary  digital  filter  was  done  using  the 
impulse  invariant  transformation  on  a  third  order  Butterworth  filter. 
Unfortunately,  the  resulting  digital  filter  had  a  relatively  large  amount 
of  ripple  which  was  not  desirable,  especially  because  multiple  filtering 
was  desired.  To  assure  monotonicity,  then,  the  digital  filter  was  re¬ 
designed,  using  the  conformal  bilinear  transform.  Also,  it  appeared  that 
for  ease  of  implementation  in  the  MAP  that  the  digital  filter  has  only 
one  zero  in  the  z-plane.  Thus,  the  general  form  of  the  transfer  function 
of  the  digital  filter  was 

“pa  +  wpa  z  1 

H (z )  -  - — - ^ - -  (9.13) 

(WCA  +1)  +  (U)CA  _1)  Z" 

where 

UCA  “  tan  (6^0  fc)» 
f^,  =  filter  cutoff  frequency  (Hz). 

The  frequency  response  of  such  a  filter  with  an  1800  Hz  cutoff  is  shown  in 
Fig.  9.6.  The  frequency  response  for  double  filtering  is  also  shown, 
and  it  can  be  seen  that  that  cutoff  frequency  is  about  1350  Hz. 


't  / 


T 
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CHAPTER  10 
BUFFER  CONTROL 


10.1  Introduction 

Even  with  the  use  of  adaptive  low  pass  filtering,  as  described  in 
the  previous  chapter,  to  decrease  bit-rate  generation  when  the  speech 
buffer  approaches  overflow,  there  still  remains  the  ultimate  problem  of 
deciding  what  to  do  to  prevent  the  buffer  from  overflowing.  Similarly, 
there  is  also  the  problem  of  deciding  what  to  do  to  prevent  the  buffer 
from  underflowing.  These  two  problems  are  considered  in  this  chapter  on 
buffer  control. 


10.2  Overflow  Control 


13 


To  prevent  buffer  overflow,  a  method  had  to  be  found  to  sharply 
limit  the  bit  generation  rate  occasionally  which  did  not  cause  an  un¬ 
reasonable  amount  of  distortion.  Because  the  quantizer  employs  feedback 
and  is  backward  adaptive,  it  is  possible  to  obtain  buffer  control  by 
denying  the  use  of  certain  quantizer  levels  (or  by  selectively  permitting 
the  use  of  additional  quantizer  levels),  or  by  varying  the  decision 
thresholds  for  the  quantizer  levels.  A  number  of  simulation  runs  were 
made  of  both  of  these  techniques  without  a  great  deal  of  success.  It 
appeared  that  dropping  or  adding  levels  was  too  crude  to  be  an  effective 
control.  It  was  found  that  either  produced  a  very  small  change  in  bit 
rate  generation,  or  a  very  pronounced  change  in  bit  rate  generation. 

Varying  the  decision  threshold  did  not  appear  to  be  very  useful,  either, 
because  it  typically  caused  too  much  distortion. 

As  a  result  of  these  investigations,  the  problem  was  approached  again 
from  a  different,  angle.  Analysis  of  simulation  results  showed  that  the 
speech  buffer  was  most  prone  to  overflow  during  instances  of  voiced  speech. 
This  suggested  that  pitched  repetition  might  be  a  useful  solution. 

Pitched  repetition  relies  on  the  large  amount  of  correlations  between 
pitch  periods  of  voiced  speech.  In  pitched  repetition,  samples  are  generated 
by  duplicating  the  samples  from  one  pitch  period  earlier.  Due  to  the  large 
amount  of  correlation,  pitched  repetition  can  typically  be  carried  out  for 
short  periods  of  time  during  voiced  speech,  without  greatly  affecting  the 
subjective  quality  of  speech. 

Details  of  the  implementation  of  pitched  repetition  are  given  in 


Chapter  2. 
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10.3  Underflow  Control 

It  is  just  as  important  to  prevent  the  transmitter  sample  buffer 
from  underflowing  as  it  is  to  prevent  it  from  overflowing.  This  is 
because  when  the  transmitter  sample  buffer  underflows,  the  receiver  sample 
buffer  overflows.  Thus,  some  method  was  needed  to  prevent  the  transmitter 
sample  buffer  from  under flowing. 

A  relatively  easy  solution  to  this  problem  was  found  by  the  use  of 
"null"  quantizer  levels.  This  technique  involves  the  transmission  of  a 
specified  bit  pattern,  just  like  a  normal  quantizer  level,  except  that  it 
causes  nothing  to  happen  and  is  discarded  at  the  receiver,  with  nothing 
being  placed  in  the  receiver  sample  buffer.  Null  quantizer  levels  are 
used  as  necessary,  then,  to  prevent  the  transmitter  sample  buffer  from 


under flowing. 


10.4  Special  Considerations  at  the  Receiver 


In  the  error-free  condition,  it  is  possible  to  control  both  the 
transmitter  and  receiver  sample  buffers  by  controlling  just  the  transmitter 
sample  buffer.  In  the  presence  of  errors,  however,  this  is  no  longer  the 
case.  For  example,  due  to  an  error,  it  would  be  possible  for  the  receiver 
sample  buffer  to  underflow  without  the  transmitter  sample  buffer  overflow¬ 
ing.  Some  simple  rules  were  developed  to  handle  this  situation  and  to 
resynchronize  the  buffers.  If  the  receiver  sample  buffer  overflows,  the 
most  recent  sample  is  discarded,  since  it  probably  represents  silence  or 
near  silence.  This  seems  to  cause  the  least  distortion,  and  resynchronizes 
the  buffers.  If  the  receiver  sample  buffer  underflows,  a  quantizer  level 
"1"  is  inserted.  This  again  causes  a  minimum  of  distortion  and  resynchro¬ 


nizes  the  buffers. 
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10.5  Conclusions  and  Suggestions  for  Further  Research 

It  is  necessary  to  provide  buffer  control  to  prevent  potentially 
catastrophic  conditions  due  to  overflow  or  underflow.  Some  relatively 
simple,  but  effective,  strategies  for  buffer  control  have  been  developed 
to  this  end. 

There  is,  of  course,  room  for  improvement.  For  example,  the  null 
quantizer  levels  could  also  be  used  to  force  resynchronization  of  the 
transmitter  and  receiver  sample  buffers.  Even  more  basic  questions  exist 
about  the  problem  of  buffer  control  itself,  because  it  would  seem  that  the 
quantization  system  is  not  as  efficient  as  it  could  be  if  it  regularly 
runs  into  overflow  and  underflow.  A  related  question  is  why  pitched  rep¬ 
etition,  which  takes  few  bits  to  transmit,  is  so  effective  at  a  time  when 
the  quantizer  is  operating  at  a  high  bit  generation  rate. 
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CHAPTER  11 

SOURCE  AND  ERROR  CONTROL  CODING 

11.1.  Introduction 

Source  and  error  control  coding  are  the  interface  between  the  internal 
variables  of  the  system  and  the  communications  channel.  The  noiseless  source 
coder  performs  the  first  step  in  generating  the  bits  to  be  transmitted. 

Its  goal  is  to  try  to  convey  all  the  necessary  information  using  a  minimum 
of  bits.  The  error  control  coding  is  then  used  to  increase  the  probability 
that  these  bits  will  be  received  without  error.  Due  to  differences  in 
quantity  and  importance,  though,  the  quantizer  levels  and  the  side  informarion 
are  handled  in  different  ways  by  the  source  and  error  control  coders. 
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11.2.  Source  Coding 

11.2.1.  Quantizer  Levels 

In  most  common  quantization  systems,  a  very  simple  source  coding 

N 

procedure  is  used:  typically  there  are  2  quantizer  levels  used  by  the 
system,  and  each  quantizer  level  is  coded  as  a  N-bit  binary  number.  In 
contrast,  PARC  uses  a  more  sophisticated  source  coding  procedure  in  order 
to  increase  performance.  The  source  code  used  to  encode  the  11  quantizer 
levels  and  the  "null"  level  use  a  variable  number  of  bits  to  represent  the 
levels,  with  fewer  bits  being  used  for  the  more  common  levels.  In  this 
way,  the  quantizer  level  information  can  be  conveyed  very  efficiently. 

The  first  attempts  at  designing  a  simple  variable  length  source  code, 
however,  proved  frustrating.  Simulations  showed  that  a  simple  variable 
length  source  code  tended  to  cause  the  sample  buffer  to  fill  rapidly  during 
segments  of  voiced  speech.  Analysis  of  the  situation  showed  that  the  problems 
appeared  to  stem  from  the  fact  that  the  quantizer  levels  were  not  stationary 
or  independent.  Instead,  analysis  showed  that  the  quantizer  levels  were  better 
represented  by  a  model  where  the  levels  were  generated  by  switching  between 
two  sources,  one  representing  the  quantizer  behavior  during  voiced  segments, 
and  another  representing  the  quantizer  behavior  during  unvoiced  and  silent 
segments. 

In  order  to  take  advantage  of  this  phenomenon,  then,  a  new  variable  length 
source  code  was  developed.  This  new  source  code  was  what  is  known  as  an  over¬ 
full  source  code.  Overfull  codes  are  characterized  by  an  ability  to  encode 
some  sequence  in  more  than  one  way.  For  example,  in  the  final  source  code, 
a  sequence  of  14  level-1' s  could  be  encoded  in  either  of  two  ways.  This 


redundancy  would  appear,  at  first  glance,  to  decrease  the  efficiency  of 
the  code,  but  it,  in  fact,  increases  the  efficiency  of  the  code.  In 
particular,  this  redundancy  is  what  allows  the  source  code  to  perforin 
well  with  a  bimodal  source.  This  was  accomplished  by  designing  most  of 
the  code  using  the  high-entropy  (voiced)  statistics,  and  then  adding  the 
long-run  codeword  based  on  the  low-entropy  statistics.  In  this  way,  the 
overfull  code  performs  better  than  a  non-overfull  code  could. 
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11.2.2.  Side  Information 

Besides  encoding  the  quantizer  levels,  the  source  coder  must  also 
encode  the  side  information  used  in  the  PARC  system.  This  information 
consists  primarily  of  the  pitch  extraction  coefficient  6  and  the  pitch 
period  T.  (Pitched  repetition  is  also  signaled  by  the  use  of  a  false  6.) 

This  encoding  is  performed  for  every  frame  of  samples. 

The  encoding  used  is  fairly  straightforward.  The  pitch  period  can 
only  be  one  of  64  possible  integers  between  20  and  83,  so  that  it  can  be 
represented  exactly  by  6  bits.  The  encoding  used  for  the  pitch  extraction 
coefficient  6,  however,  is  slightly  more  complicated.  The  first  complication 
is  that  the  value  of  8  must  be  quantized  because  it  is  a  real  number.  It 
was  determined  by  simulation,  though,  that  the  system  was  relatively  in¬ 
sensitive  to  the  quantization  of  6,  and  that  using  97  quantization  levels, 
evenly  distributed  between  -2  and  2,  appeared  to  have  a  negligible  effect. 

The  other  complication  was  the  signaling  of  pitched  repitition  through  the 
use  of  a  false  8.  The  signaling  itself  could  be  handled  easily  by  simply 
assigning  it  an  unused  6  value.  It  was  felt,  however,  that  it  was  important 
that  this  signal  not  be  mistaken.  The  encoding  used  for  8,  then,  used  7 
bits,  with  the  all  zero  pattern  reserved  for  pitched  repetition.  Error 
suppression  was  then  provided  by  not  assigning  any  8  quantizer  levels  to 
the  patterns  containing  one  or  two  l's.  This  left  99  patterns  for  8 
quantizer  levels,  while  protecting  the  pitched  repetition  signal  from 


single  bit  errors. 
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11.3.  Error  Control  Coding 

In  order  to  maintain  system  performance  when  using  a  communications 
channel  with  a  relatively  high  bit  error  rate,  error  control  coding  is 
provided  for  each  block  of  bits.  There  were  several  constraints,  though, 
which  dictated  what  kind  of  error  control  coding  could  be  used.  The 
blocks  were  required  to  be  about  200  bits  long  by  the  pitch  information. 

The  block  length  also  was  constrained  by  synchronization  requirements; 
it  was  also  constrained  by  the  lengths  which  simple  coding  schemes  require. 
All  of  the  constraints  were  satisfied  by  performing  the  coding  over  partial 
blocks,  rather  than  an  entire  block  at  a  time.  The  189  bit  block  is 
divided  into  three  63  bit  frames,  so  that  a  single-error-correcting  (57,  63) 
Hamming  code  may  be  used.  As  a  result,  up  to  3  bit  errors  per  block  can 
be  corrected,  greatly  improving  the  performance  of  the  system  in  a  severe 


environment. 
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11.4.  Conclusions  and  Suggestions  for  Further  Research 

The  source  and  error  control  coding  described  in  this  chapter 
allows  the  system  to  perform  efficiently  and  cope  with  severe  communication 
environments.  The  schemes  described  were  rather  simple,  as  appeared  to  be 
required  for  this  implementation.  There  are,  of  course,  more  sophisticated 
methods  which  could  be  studied  which  might  improve  performance  even  further 
A  major  question  is  how  to  develop  quantization  schemes  which  allocate 
bits  efficiently  according  to  subjective  criteria.  Another  major  issue 
is  how  to  design  error  protection  systems  which  perform  well  over  a  large 
range  of  bit  error  rates. 


APPENDIX  A 

FORTRAN  SIMULATION  OF  ALGORITHM 


This  appendix  presents  a  listing  of  the  FORTRAN  simulation  of  PARC 
algorithm.  This  simulation  differs  from  the  real-time  algorithm  in  two 
ways.  First,  this  algorithm  operates  on  a  block  containing  a  fixed 
number  of  samples  rather  than  a  fixed  number  of  bits.  Second,  the 
algorithm  described  here  does  not  have  the  adaptive  filtering  of  the 
input  sample  sequence. 


COMMAND  FILE  TO  BUILD  THE  TASK  FOB  FARC  SIMULATION 
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APPENDIX  B 

SEGMENTED  SNR  PLOTS 


B.  1.  Introduction 

A  number  of  similar  programs  were  develoi ed  to  aid  in  the  analysis 
of  the  performance  of  PARC.  The  purpose  of  the  programs  were  to  generate 
plots,  called  SNRSIG,  waicn  indicate  graphically  the  short-time 
performance  of  the  system  versus  the  short-time  signal  level.  This  in¬ 
formation  proved  useful,  as  it  appears  that  the  average  performance  of 
the  systems  over  short  periods  of  time  is  more  indicative  than  the  overall 
average  performance. 

There  are  three  groups  of  programs,  with  two  programs  in  each  group. 
The  first  program  in  each  group  performs  the  short-time  analysis,  and 
generates  the  data  to  be  plotted.  The  second  program  then  takes  the  data 
and  generates  the  actual  plot. 

The  first  group,  DBCALC  and  SNRSIG,  deals  with  the  short-time  signal- 
to-noise  ratio.  This  group  is  easeful  in  analyzing  the  performance  of 
the  quantizer,  especially  problems  like  slope  overload  noise  and  granular 
noise.  The  second  group,  BFCALC  and  BFPLOT,  deals  with  the  average  buffer 
length,  and  the  third  group,  BFDIFF  and  BFDFPL,  deals  with  the  difference  in 
the  average  buffer  length.  These  programs  are  useful  in  analyzing  the 
performance  of  the  source  coding  and  of  the  buffer  control. 


tr 


PURPOSE  *  TO  CALCULATE  THE  LOCAL  SIGNAL  STRENGTH  AND  THE  LOCAL 
SNR  FOR  A  SPEECH  FILE  AND  AN  SHAT  FILE. 
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•033  NX-VNY 

••34  NY-VNX 

*•35  CALL  GRI0(X,V,NX,X0,NY,Y0,LMASK1  ) 
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APPENDIX  C 

PLOTTING  PROGRAM 


This  program  is  used  to  plot  a  data  file  or  the  difference  of  two 
data  file  on  a  Versaplot  07  system.  it  emplovs  the  Versaplot-07  PPEP 
Software  Package. 

Two  options  tire  provided  by  this  program: 

1.  It  can  plot  a  whole  data  file.  The  ranges  of  x-axis  and  v-axis 
are  specified  by  the  user  through  a  terminal. 

2.  It  can  plot  a  number  of  sections  of  a  data  file.  The  starting 
location  and  number  of  sections  can  be  specified  by  the  user 
through  a  terminal.  However,  the  size  of  a  section  is  fixed  to 
1600  samples. 

Data  must  be  stored  in  1615  format  with  a  standard  format  header  card. 
Details  are  shown  in  the  program  listing. 
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•  131  CALL  CLOSE! 1 1 

•132  STOP 

•133  ENO 
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APPENDIX  0 

PDp-ll  D/A  PROGRAMS 

This  set  of  D.'A  programs  is  used  for  a  d  ig  i  t  a  1  -to-ana  log  converting 
operation  employing  the  DATEL's  ST-PDP  device  which  is  an  analog  I/O  module 
for  DEC  PDP-11  minicomputers.  The  ST-PDP  device  has  three  uf'Vrcut  moths: 

1.  Program  Control  Interface 

2.  Interrupt  Serviced  Interface 

3.  Direct  Memory  Address 

The  set  of  D/A  programs  uses  the  Program  Control  Interface  mode. 

The  set  consists  of  six  modules: 

1.  DA.CMD  -  The  indirect  command  file  to  execute  the  D/A 

converting  operation. 

2.  DASP.FTN  -  The  Fortran  file  to  output  a  speech  data  file  to 

the  D/A  converter. 

3.  DARAMP.FTN  -  The  Fortran  file  to  output  a  ramp  function  to  the 

D/A  converter  in  order  to  check  the  timing  and 
operation. 

4.  DA64.MAC  -  The  MACRO-11  assembly  language  subroutine  for 

6400  samples  per  second  sampling  frequency. 

5.  DA80.MAC  -  The  MACRO-11  assembly  language  subroutine  for 

8000  samples  per  second  sampling  frequency. 

6.  COMMON. MAC  -  The  MACRO-11  assembly  language  file  used  to  build 

a  common  device  block  inside  the  PDP-11  operating 
system. 

This  set  of  programs  allows  a  user  to  output  a  data  file  from  disk  to 
the  D/A  converter.  Data  must  be  stored  in  1615  format  with  a  standard  format 
header  curd.  Details  are  shown  in  the  program  listing. 


248 


A  common  device  block  has  to  be  built  the  first  time  the  D/A  device 
is  used.  The  way  to  build  is  shown  below.  Further  details  can  be  found 
in  the  I/O  Driver  Reference  Manual  of  the  PDP-11  RMS-11M  operating  system. 

1.  Logon  a  privileged  'JIC:  For  future  reference,  use  UIC  =  [3,1] 

2.  >  MAC  COMMON  =  COMMON 

(underline  means  the  prompt  of  the  computer) 

3.  1  SET  /UIC  =  [1,1] 

4.  >  TKB 

TKB>  COMMON /MM,  LP:,  ST:  COMMON/PI/-HD  =  [3,1]  COMMON 
TKB>  / 

ENTER  OPTIONS: 

TKB>  PAR  =  COMMON:  0:16000 
TKB>  STACK  =  0 
TKB>  1 

5.  Logoff 

The  procedure  to  execute  the  D/A  modules  is  as  follows: 

1.  Logon  a  privileged  UIC 

2.  Execute  the  indirect  command  file  DA.CMD  (i.e.,  TYPE  @  DA) 

During  the  execution,  the  D/A  modules  will  ask  for  additional  information  to 
set  up  the  D/A  operation.  It  will  also  set  the  CPU  at  the  highest  hardware 
priority,  i.e.,  it  occupies  the  CPU.  Thus,  it  will  suspend  other  users' 
programs  and  stop  the  real-time  clock.  After  the  D/A  operation,  it  will  re¬ 
start  other  users'  programs  and  the  real-time  clock. 

The  program  listings  are  as  follows: 


I 

I 


I 

I 

I 


FORTRAN  IV-FLUS  V02-S1  ll>22t3*  31-MAR-I0  FACE 

DASF.FTN  /TRl BLOCKS /UR 
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